2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum最新文献

筛选
英文 中文
Message from the HCW Steering Committee Chair HCW指导委员会主席致辞
B. Shirazi
{"title":"Message from the HCW Steering Committee Chair","authors":"B. Shirazi","doi":"10.1109/IPDPSW.2016.220","DOIUrl":"https://doi.org/10.1109/IPDPSW.2016.220","url":null,"abstract":"These are the proceedings of the “22nd Heterogeneity in Computing Workshop,” also known as HCW 2013. A few years ago, the title of the workshop was changed from the original title of “Heterogeneous Computing Workshop” to reflect the breadth of the impact of heterogeneity, as well as to stress that the focus of the workshop is on the management and exploitation of heterogeneity. All of this is, of course, taken in the context of the parent conference, the International Parallel and Distributed Processing Symposium (IPDPS), and so explores heterogeneity in parallel and distributed computing systems.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123911692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unstructured Control Flow in GPGPU GPGPU中的非结构化控制流
Rodrigo Dominguez, D. Kaeli
{"title":"Unstructured Control Flow in GPGPU","authors":"Rodrigo Dominguez, D. Kaeli","doi":"10.1109/IPDPSW.2013.247","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.247","url":null,"abstract":"The current trend toward heterogeneous architectures motivates us to reconsider current software and hardware paradigms. The focus is centered around new parallel programming models, compiler design, and runtime resource management techniques to exploit the features of many-core processor architectures. Graphics Processing Units (GPU) have become the platform of choice in this area for accelerating a large range of data-parallel and task-parallel applications. The rapid adoption of GPU computing has been greatly aided by the introduction of high-level programming environments such as CUDA C and OpenCL. However, each vendor implements these programming models differently and we must analyze the internals in order to get a better understanding of the performance results. One of the main differences across implementations is the handling of program control flow by the compiler and the hardware. Some implementations can support unstructured control flow based on branches and labels; others are based on structured control flow relying solely on if-then and while constructs. In this paper we describe a tool that can be used to analyze the difference between these two approaches. We created a dynamic compiler called Caracal that translates applications with unstructured control flow so they can run on hardware that requires structured programs. In order to accomplish this, Caracal builds a control tree of the program and creates single-entry, single-exit regions called hammock graphs. We used this tool to analyze the performance differences between NVIDIA's implementation of CUDA C and AMD's implementation of OpenCL. We found that the requirement for structured control flow can increase the number of registers allocated by 20 registers and impact performance as much as 2x.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115477776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Identifying High betweenness Centrality Vertices in Large Noisy Networks 大型噪声网络中高中间度中心性点的识别
Vladimir Ufimtsev, S. Bhowmick
{"title":"Identifying High betweenness Centrality Vertices in Large Noisy Networks","authors":"Vladimir Ufimtsev, S. Bhowmick","doi":"10.1109/IPDPSW.2013.171","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.171","url":null,"abstract":"Most real-world network models inherently include some degree of noise due to the approximations involved in measuring real-world data. My thesis focuses on studying how these approximations affect the stability of the networks. In this paper, we focus on the stability of betweenness centrality (BC), a metric used to measure the importance of the vertices in the network. We present our results on how the ranking of the vertices change as the networks are perturbed and introduce a group testing algorithm that we developed that can correctly identify the high valued BC vertices of stable networks in lower time than the traditional approaches.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123074618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Avoiding Locks and Atomic Instructions in Shared-Memory Parallel BFS Using Optimistic Parallelization 使用乐观并行化避免共享内存并行BFS中的锁和原子指令
Jesmin Jahan Tithi, Dhruv Mátáni, Gaurav Menghani, R. Chowdhury
{"title":"Avoiding Locks and Atomic Instructions in Shared-Memory Parallel BFS Using Optimistic Parallelization","authors":"Jesmin Jahan Tithi, Dhruv Mátáni, Gaurav Menghani, R. Chowdhury","doi":"10.1109/IPDPSW.2013.241","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.241","url":null,"abstract":"Dynamic load-balancing in parallel algorithms typically requires locks and/or atomic instructions for correctness. We have shown that sometimes an optimistic parallelization approach can be used to avoid the use of locks and atomic instructions during dynamic load balancing. In this approach one allows potentially conflicting operations to run in parallel with the hope that everything will run without conflicts, and if any occasional inconsistencies arise due to conflicts, one will be able to handle them without hampering the overall correctness of the program. We have used this approach to implement two new types of high-performance lock free parallel BFS algorithms and their variants based on centralized job queues and distributed randomized work-stealing, respectively. These algorithms are implemented using Intel cilk++, and shown to be scalable and faster than two state-of-the-art multicore parallel BFS algorithms by Leiserson and Schardl (SPAA, 2010) and Hong et al. (PACT, 2011), where the algorithm described in the fast paper is also free of locks and atomic instructions but does not use optimistic parallelization. Our implementations can also handle scale-free graphs very efficiently which frequently arise in real-world scenarios such as the World Wide Web, social-networks, biological interaction networks, etc.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115764325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Semi-Matching Algorithms for Scheduling Parallel Tasks under Resource Constraints 资源约束下并行任务调度的半匹配算法
Anne Benoit, Johannes Langguth, B. Uçar
{"title":"Semi-Matching Algorithms for Scheduling Parallel Tasks under Resource Constraints","authors":"Anne Benoit, Johannes Langguth, B. Uçar","doi":"10.1109/IPDPSW.2013.30","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.30","url":null,"abstract":"We study the problem of minimum make span scheduling when tasks are restricted to subsets of the processors (resource constraints), and require either one or multiple distinct processors to be executed (parallel tasks). This problem is related to the minimum make span scheduling problem on unrelated machines, as well as to the concurrent job shop problem, and it amounts to finding a semi-matching in bipartite graphs or hyper graphs. The problem is known to be NP-complete for bipartite graphs with general vertex (task) weights, and solvable in polynomial time for unweighted graphs (i.e., unit-weight tasks). We prove that the problem is NP-complete for hyper graphs even in the unweighted case. We design several greedy algorithms of low complexity to solve two versions of the problem, and assess their performance through a set of exhaustive simulations. Even though there is no approximation guarantee for these low-complexity algorithms, they return solutions close to the optimal (or a known lower bound) in average.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121101021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ASHES Introduction 骨灰的介绍
Jiayuan Meng
{"title":"ASHES Introduction","authors":"Jiayuan Meng","doi":"10.1109/IPDPSW.2013.290","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.290","url":null,"abstract":"","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124829704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Evolutionary Algorithms to Maximum Lifetime Coverage Problem in Wireless Sensor Networks 进化算法在无线传感器网络最大寿命覆盖问题中的应用
A. Tretyakova, F. Seredyński
{"title":"Application of Evolutionary Algorithms to Maximum Lifetime Coverage Problem in Wireless Sensor Networks","authors":"A. Tretyakova, F. Seredyński","doi":"10.1109/IPDPSW.2013.96","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.96","url":null,"abstract":"The paper analyzes three recently proposed algorithms, which differ not only in the method of finding solution to the Maximum Lifetime Coverage problem in Wireless Sensor Networks (WSN), but also approaches to a statement of the problem. In order to compare algorithms: (1) they were adapted to the common assumptions that correspond to real characteristics of sensor networks and (2) special sensor network simulator was used to study the algorithms. The paper presents the results of an experimental study and shows how the lifetime of WSN depends on a different set of algorithms parameters and WSNs parameters. Based on the results of the experiments, conclusions about the validity of the assumptions of algorithms, quality of solutions and possible improvements are drawn.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124895301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Scalable, Multithreaded, Partially-in-Place Sorting 可伸缩、多线程、局部就地排序
D. Haglin, Robert Adolf, Greg E. Mackey
{"title":"Scalable, Multithreaded, Partially-in-Place Sorting","authors":"D. Haglin, Robert Adolf, Greg E. Mackey","doi":"10.1109/IPDPSW.2013.74","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.74","url":null,"abstract":"A recent trend in hardware development is producing computing systems that are stretching the number of cores and size of shared-memory beyond where most fundamental serial algorithms perform well. The expectation is that this trend will continue. So it makes sense to rethink our fundamental algorithms such as sorting. There are many situations where data that needs to be sorted will actually fit into the shared memory so applications could benefit from an efficient parallel sorting algorithm. When sorting large data (at least hundreds of Gigabytes) in a single shared memory, there are two factors that affect the algorithm choice. First, does the algorithm sort in-place? And second, does the algorithm scale well beyond tens of threads? Surprisingly, existing algorithms possess either one of these factors, but not both. We present an approach that gracefully degrades in performance as the amount of available working memory decreases relative to the size of the input.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125036119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Hough Transform on the FPGA using DSP Slices and Block RAMs 基于DSP片和块ram的FPGA高效霍夫变换
Xin Zhou, Norihiro Tomagou, Yasuaki Ito, K. Nakano
{"title":"Efficient Hough Transform on the FPGA using DSP Slices and Block RAMs","authors":"Xin Zhou, Norihiro Tomagou, Yasuaki Ito, K. Nakano","doi":"10.1109/IPDPSW.2013.86","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.86","url":null,"abstract":"The main contribution of this paper is to present a new FPGA architecture for the Hough transform that identifies straight lines in a binary image. Recent FPGAs have hundreds of embedded DSP slices and block RAMs. For example, Xilinx Virtex-6 Family FPGAs have a DSP48E1 slice, which is a configurable logic block equipped with fast multipliers, adders, pipeline registers, and so on. They also have a dual-port memory with 18Kbits as a block RAM. One of the most important key techniques for accelerating computation using FPGAs is an efficient usage ofDSP slices and block RAMs. Our new architecture for the Hough transform uses 178 DSP48E1 slices and 180 block RAMs with 18Kbits that work in parallel. As far as we know, there is no previously published work that fully utilizes DSP slices and block RAMs for the Hough transform. Roughly speaking, a conventional sequential implementation performs 180m voting operations for m edge points. Our architecture performs voting operations in parallel, and outputs identified straight lines in m+97 clock cycles. Since 180m voting operations are performed using 178 DSP48E1 slices, the lower bound of the computing time is m clock cycles. Hence our implementation is close to optimal. The implementation results show that the Hough transform for a 512×512 image with 33232 edge points can be done in only 135.75us.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123307230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Revisiting the Double Checkpointing Algorithm 重新审视双检查点算法
J. Dongarra, T. Hérault, Y. Robert
{"title":"Revisiting the Double Checkpointing Algorithm","authors":"J. Dongarra, T. Hérault, Y. Robert","doi":"10.1109/IPDPSW.2013.11","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.11","url":null,"abstract":"Fast check pointing algorithms require distributed access to stable storage. This paper revisits the approach base upon double check pointing, and compares the blocking algorithm of Zheng, Shi and Kalé, with the non-blocking algorithm of Ni, Meneses and Kalé, in terms of both performance and risk. We also extend their model proposed to assess the impact of the overhead associated to non-blocking communications. We then provide a new peer-to-peer check pointing algorithm, called the triple check pointing algorithm, that can work at constant memory, and achieves both higher efficiency and better risk handling than the double check pointing algorithm. We provide performance and risk models for all the evaluated protocols, and compare them through comprehensive simulations.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125284043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信