{"title":"Reducing the Environmental Impact of Optical Networks","authors":"Thilo Schondienst, V. Vokkarane","doi":"10.1109/IPDPSW.2013.46","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.46","url":null,"abstract":"With increased energy-efficiency of any commodity the total energy consumed typically actually rises. We study the potential reduction of the environmental impact by smart configuration of optical networks in combination with renewable energy sources. We present approaches to effectively cut down emission of green house gases, instead of simply improving energy-efficiency.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133643587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew Anderson, M. Brodowicz, T. Sterling, Hartmut Kaiser, Bryce Adelstein-Lelbach
{"title":"Tabulated Equations of State with a Many-tasking Execution Model","authors":"Matthew Anderson, M. Brodowicz, T. Sterling, Hartmut Kaiser, Bryce Adelstein-Lelbach","doi":"10.1109/IPDPSW.2013.162","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.162","url":null,"abstract":"The addition of nuclear and neutrino physics to general relativistic fluid codes allows for a more realistic description of hot nuclear matter in neutron star and black hole systems. This additional microphysics requires that each processor have access to large tables of data, such as equations of state, and in large simulations, the memory required to store these tables locally can become excessive unless an alternative execution model is used. In this work we present relativistic fluid evolutions of a neutron star obtained using a message driven multi-threaded execution model known as ParalleX. The goal of this work is to reduce the negative performance impact of distributing the tables. We introduce a component based on the notion of a \"future\", or no blocking encapsulated delayed computation, for accessing large tables of data, including out of-core sized tables. The proposed technique does not impose substantial memory overhead and can hide increased network latency.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133141425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimizing Remote Accesses in MapReduce Clusters","authors":"Prateek Tandon, Michael J. Cafarella, T. Wenisch","doi":"10.1109/IPDPSW.2013.195","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.195","url":null,"abstract":"MapReduce, in particular Hadoop, is a popular framework for the distributed processing of large datasets on clusters of relatively inexpensive servers. Although Hadoop clusters are highly scalable and ensure data availability in the face of server failures, their efficiency is poor. We study data placement as a potential source of inefficiency. Despite networking improvements that have narrowed the performance gap between map tasks that access local or remote data, we find that nodes servicing remote HDFS requests see significant slowdowns of collocated map tasks due to interference effects, whereas nodes making these requests do not experience proportionate slowdowns. To reduce remote accesses, and thus avoid their destructive performance interference, we investigate an intelligent data placement policy we call 'partitioned data placement'. We find that, in an unconstrained cluster where a job's map tasks may be scheduled dynamically on any node over time, Hadoop's default random data placement is effective in avoiding remote accesses. However, when task placement is restricted by long-running jobs or other reservations, partitioned data placement substantially reduces remote access rates (e.g., by as much as 86% over random placement for a job allocated only one-third of a cluster).","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128852544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Flexible and Fast Routing Strategies for Dynamic Network Provisioning","authors":"Liudong Zuo, Mengxia Zhu","doi":"10.1109/IPDPSW.2013.50","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.50","url":null,"abstract":"Reserving bandwidth as needed in high-performance networks makes the fast and reliable data transfer with guaranteed performance possible in large-scale collaborative e-science. Besides the notification of acceptance or rejection for a particular reservation request, users normally want to know the earliest possible finish time or the minimum total transfer duration for the data transfer. Several routing algorithms have been proposed to achieve such desired goals given the data size, the data available time, and the deadline to finish the data transfer. Instead of directly processing the bandwidth reservation request (BRR) from users, our approach analyses various parameters to strategically narrow down the solution search space for fast system response. Adapted from some previous works, two algorithms are proposed to compute the reservation options with the earliest completion time (ECT) and with the shortest duration (SD) for multiple BRRs accumulated during a certain period. Extensive simulation results demonstrate the superiority of the proposed algorithms in terms of reduced execution time and improved success ratio of BRRs in comparison with existing scheduling algorithms.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124142371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Scheduling Model for Broadcasting in Wireless Sensor Networks","authors":"Hongju Cheng, N. Xiong, Xingbo Huang, L. Yang","doi":"10.1109/IPDPSW.2013.88","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.88","url":null,"abstract":"Energy efficiency is especially important to the broadcasting operation in wireless sensor networks. It helps to reduce the energy consumption by minimizing the number of relay nodes during the broadcast process in case that the transmission range is identical to all nodes in the network. In this paper, we have introduced an efficient heuristic algorithm EMCDS to build the Minimum Connected Dominating Set with the proposed ordered sequence list. The simulation results show that the proposed EMCDS algorithm can find smaller CDS compared with related works.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114615105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AzureBOT: A Framework for Bag-of-Tasks Applications on the Azure Cloud Platform","authors":"Dinesh Agarwal, S. Prasad","doi":"10.1109/IPDPSW.2013.261","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.261","url":null,"abstract":"Windows Azure is an emerging cloud platform that provides application developers with APIs to write scientific and commercial applications. However, the steep learning curve to understand the unique architecture of the cloud platforms in general and continuously changing Azure APIs specifically, make it difficult for the application developers to write cloud based applications. During our extensive experience with Azure cloud platform over the past few years, we have identified the need of a framework to abstract the complexities of working with the Azure cloud platform. Such a framework is essential for adoption of cloud technologies. Therefore, we have created AzureBOT-a framework for the Azure cloud platform to write bag-of-tasks class of distributed applications. Azure provides a straightforward and general interface that permits developers to concentrate on their application logic rather than cloud interaction. While we have implemented AzureBOT on Azure cloud platform, our framework design is generic to most of the cloud platforms. In this paper, we present the detailed design of our framework's internal architecture, the APIs in brief, and the usability of our framework. We also discuss the implementation of two different applications and their scalability results over 100Azure worker processors.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114832591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RTL Simulation of High Performance Dynamic Reconfiguration: A Video Processing Case Study","authors":"Lingkan Gong, O. Diessel, Johny Paul, W. Stechele","doi":"10.1109/IPDPSW.2013.79","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.79","url":null,"abstract":"Dynamically Reconfigurable Systems (DRS) allow hardware logic to be partially reconfigured while the rest of the design continues to operate. For example, the Auto Vision driver assistance system swaps video processing engines when the driving conditions change. However, the architectural flexibility of DRS also introduces challenges for verifying system functionality. Using Auto Vision as a case study, this paper studies the use of a recent RTL simulation library, ReSim, to perform functional verification of DRS designs. Compared with the conventional Virtual Multiplexing approach, ReSim more accurately simulates the Auto Vision system before, during and after reconfigurations. With trivial development and simulation overhead, ReSim assisted in detecting significantly more bugs than found using Virtual Multiplexing. To the best of our knowledge, this paper is the first significant effort towards functionally verifying a cutting-edge, complex, real-world DRS application.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114563906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Accuracy and Performance of Collaborative Filtering Algorithm by Stochastic SVD and Its MapReduce Implementation","authors":"Che-Rung Lee, Ya-Fang Chang","doi":"10.1109/IPDPSW.2013.120","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.120","url":null,"abstract":"Collaborative filtering algorithms that extract desired information from records have been widely used in data mining and information retrieval, such as recommendation systems. However, the rapidly increased data size demands more efficient and scalable algorithms and implementations. In this paper, we present a novel algorithm that utilizes stochastic singular value decomposition (SSVD) in the calculation of item-based collaborative filtering. The use of SSVD does not only provide more accurate results in terms of precision and recall, but also reduces the computational cost. The proposed algorithm was implemented using Hadoop MapReduce, which allows distributed processing of massive data stored in a distributed file system. The implementation was evaluated and compared with the recommendation systems provided in the Apache Mahout project, and a 2.53 speedup can be obtained for processing millions records. The accuracy of our algorithm is also 3 times better than the non-SVD algorithm in terms of the F1 metric, a combinative measurement of precision and recall.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117208109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ramakrishnan, Robert Reutiman, A. Chandra, J. Weissman
{"title":"Accelerating Distributed Workflows with Edge Resources","authors":"S. Ramakrishnan, Robert Reutiman, A. Chandra, J. Weissman","doi":"10.1109/IPDPSW.2013.240","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.240","url":null,"abstract":"Distributed data-intensive workflow applications are increasingly relying on and integrating remote resources including community data sources, services, and computational platforms. Increasingly, these are made available as data, SAAS, and IAAS clouds. The execution of distributed data-intensive workflow applications can exposé network bottlenecks between clouds that compromise performance. In this paper, we focus on alleviating network bottlenecks by using a proxy network. In particular, we show how proxies can eliminate network bottlenecks by smart routing and perform in-network computations to boost workflow application performance. A novel aspect of our work is the inclusion of multiple proxies to accelerate different workflow stages optimizing different performance metrics. We show that the approach is effective for workflow applications and broadly applicable. Using Montage as an exemplar workflow application, results obtained through experiments on Planet Lab showed how different proxies acting in a variety of roles can accelerate distinct stages of Montage. Our microbenchmarks also show that routing data through select proxies can accelerate network transfer for TCP/UDP bandwidth, delay, and jitter, in general.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117284077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward a Scalable Heterogeneous Runtime System for the Convey MX Architecture","authors":"John D. Leidel, J. Bolding, Geoffrey Rogers","doi":"10.1109/IPDPSW.2013.18","DOIUrl":"https://doi.org/10.1109/IPDPSW.2013.18","url":null,"abstract":"Given the recent advent of the multicore era [1], research efforts in the area of high performance, low latency runtime systems have increased significantly. This research has given birth to new techniques in low-overhead scheduling techniques, small-memory footprint parallel execution units and kernel-free contextual environments. This paper presents a framework and runtime system for a truly heterogeneous approach to low-latency, high performance runtime techniques on the Convey MX-100 platform and CHOMP micro-architecture [14]. This framework, deemed the Convey Lightweight Runtime [CLR], is designed to provide high performance, programming-model agnostic parallel library support to the massively parallel CHOMP infrastructure. This work explores the fundamental design requirements and implementation details behind constructing the CLR system as a truly heterogeneous low-level runtime system for a wide array of parallel programming model targets.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128253860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}