{"title":"Fine-Grained Profiling for Data-Intensive Workflows","authors":"N. Dun, K. Taura, A. Yonezawa","doi":"10.1109/CCGRID.2010.29","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.29","url":null,"abstract":"Profiling is an effective dynamic analysis approach to investigate complex applications. ParaTrac is a user-level profiler using file system and process tracing techniques for data-intensive workflow applications. In two respects ParaTrac helps users refine the orchestration of workflows. First, the profiles of I/O characteristics enable users to quickly identify bottlenecks of underlying I/O subsystems. Second, ParaTrac can exploit fine-grained data-processes interactions in workflow execution to help users understand, characterize, and manage realistic data-intensive workflows. Experiments on thoroughly profiling Montage workflow demonstrate that ParaTrac is scalable to tracing events of thousands of processes and effective in guiding fine-grained workflow scheduling or workflow management systems improvements.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123441331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Origin of Services Using RIDDL for Description, Evolution and Composition of RESTful Services","authors":"Juergen Mangler, P. Beran, E. Schikuta","doi":"10.1109/CCGRID.2010.126","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.126","url":null,"abstract":"WSDL as a description language serves as the foundation for a host of technologies ranging from semantic annotation to composition and evolution. Although WSDL is well understood and in widespread use, it has its shortcomings which are partly imposed by the way how the SOAP protocol works and is used. Cloud computing fostered the rise of Representational State Transfer (REST), a return to arguably simpler but more flexible ways to expose services solely through the HTTP protocol. For RESTful services many achievements that have been acquired have to be rethought and reapplied. We perceive that one of the biggest hurdles is the lack of a dedicated and simple yet powerful language to describe RESTful services. In this paper we want to introduce RIDDL, a flexible and extensible XML based language that not only allows to describe services but also covers the basic requirements of service composition and evolution to provide a clean foundation for further developments.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123982744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WORKEM: Representing and Emulating Distributed Scientific Workflow Execution State","authors":"L. Ramakrishnan, Dennis Gannon, Beth Plale","doi":"10.1109/CCGRID.2010.89","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.89","url":null,"abstract":"Scientific workflows have become an integral part of cyber infrastructure as their computational complexity and data sizes have grown. However, the complexity of the distributed infrastructure makes design of new workflows, determining the right management policies, debugging, testing or reproduction of errors challenging. Today, workflow engines manage the dependencies between tasks of workflows and there are tools available to wrap scientific codes. There is a need for a customizable, isolated and manageable testing container for design, evaluation and deployment of distributed workflows. To build such an environment, we need to be able to model and represent, capture and possibly reuse the execution flows within each task of a workflow that accurately captures the execution behavior. In this paper, we present the design and implementation of WORKEM, an extensible framework that can be used to represent and emulate workflow execution state. We also detail the use of the framework in two specific case studies (a) design and testing of an orchestration system (b) generation of a provenance database. Our evaluation shows that the framework has minimal overheads and can be scaled to run hundreds of workflows in short durations of time and with a high amount of parallelism.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129769619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asynchronous Communication Schemes for Finite Difference Methods on Multiple GPUs","authors":"D. Playne, K. Hawick","doi":"10.1109/CCGRID.2010.86","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.86","url":null,"abstract":"Finite difference methods continue to provide an important and parallelisable approach to many numerical simulations problems. Iterative multigrid and multilevel algorithms can converge faster than ordinary finite difference methods but can be more difficult to parallelise. Data parallel paradigms tend to lend themselves particularly well to solving regular mesh PDEs whereby low latency communications and high compute to communications ratios can yield high levels of computational efficiency and raw performance. We report on some practical algorithmic and data layout approaches and on performance data on a range of Graphical Processing Units (GPUs) with CUDA. We focus on the use of multiple GPU devices with a single CPU host.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127411630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcus Carvalho, Renato Miceli, P. D. Maciel, F. Brasileiro, R. Lopes
{"title":"Predicting the Quality of Service of a Peer-to-Peer Desktop Grid","authors":"Marcus Carvalho, Renato Miceli, P. D. Maciel, F. Brasileiro, R. Lopes","doi":"10.1109/CCGRID.2010.50","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.50","url":null,"abstract":"Peer-to-peer (P2P) desktop grids have been proposed as an economical way to increase the processing capabilities of information technology (IT) infrastructures. In a P2P grid, a peer donates its idle resources to the other peers in the system, and, in exchange, can use the idle resources of other peers when its processing demand surpasses its local computing capacity. Despite their cost-effectiveness, scheduling of processing demands on IT infrastructures that encompass P2P desktop grids is more difficult. At the root of this difficulty is the fact that the quality of the service provided by P2P desktop grids varies significantly over time. The research we report in this paper tackles the problem of estimating the quality of service of P2P desktop grids. We base our study on the OurGrid system, which implements an autonomous incentive mechanism based on reciprocity, called the Network of Favours (NoF). In this paper we propose a model for predicting the quality of service of a P2P desktop grid that uses the NoF incentive mechanism. The model proposed is able to estimate the amount of resources that is available for a peer in the system at future instants of time. We also evaluate the accuracy of the model by running simulation experiments fed with field data. Our results show that in the worst scenario the proposed model is able to predict how much of a given demand for resources a peer is going to obtain from the grid with a mean prediction error of only 7.2%.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127023493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Bautista-Gomez, N. Maruyama, F. Cappello, S. Matsuoka
{"title":"Distributed Diskless Checkpoint for Large Scale Systems","authors":"L. Bautista-Gomez, N. Maruyama, F. Cappello, S. Matsuoka","doi":"10.1109/CCGRID.2010.40","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.40","url":null,"abstract":"In high performance computing (HPC), the applications are periodically check pointed to stable storage to increase the success rate of long executions. Nowadays, the overhead imposed by disk-based checkpoint is about 20% of execution time and in the next years it will be more than 50% if the checkpoint frequency increases as the fault frequency increases. Diskless checkpoint has been introduced as a solution to avoid the IO bottleneck of disk-based checkpoint. However, the encoding time, the dedicated resources (the spares) and the memory overhead imposed by diskless checkpoint are significant obstacles against its adoption. In this work, we address these three limitations: 1) we propose a fault tolerant model able to tolerate up to 50% of process failures with a low check pointing overhead 2) our fault tolerance model works without spare node, while still guarantying high reliability, 3) we use solid state drives to significantly increase the checkpoint performance and avoid the memory overhead of classic diskless checkpoint.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122251582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-criteria Content Adaptation Service Selection Broker","authors":"M. F. M. Fudzee, J. Abawajy, M. M. Deris","doi":"10.1109/CCGRID.2010.128","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.128","url":null,"abstract":"In this paper, we propose a service-oriented content adaptation framework and an approach to the Content Adaptation Service Selection (CASS) problem. In particular, the problem is how to assign adaptation tasks (e.g., transcoding, video summarization, etc) together with respective content segments to appropriate adaptation services. Current systems tend to be mostly centralized suffering from single point failures. The proposed algorithm consists of a greedy and single objective assignment function that is constructed on top of an adaptation path tree. The performance of the proposed service selection framework is studied in terms of efficiency of service selection execution under various conditions. The results indicate that the proposed policy performs substantially better than the baseline approach.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"26 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114132300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A High-Level Interpreted MPI Library for Parallel Computing in Volunteer Environments","authors":"T. LeBlanc, J. Subhlok, E. Gabriel","doi":"10.1109/CCGRID.2010.85","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.85","url":null,"abstract":"Idle desktops have been successfully used to run sequential and master-slave task parallel codes on a large scale in the context of volunteer computing. However, execution of message passing parallel programs in such environments is challenging because a pool of nodes to execute an application may have architectural and operating system heterogeneity, can include widely distributed nodes across security domains, and nodes may become unavailable for computation frequently and without warning. The VolPEx (Parallel Execution on Volatile Nodes) tool set is building MPI support in such environments based on selective use of process redundancy and message logging. However, addressing this challenge requires tradeoffs between performance, portability, and usability. The paper introduces a robust MPI library that is designed to be highly portable across heterogeneous architectures and operating systems. This VolpexPyMPI library is built with Python, works with Linux and Windows platforms and accepts user level MPI programs written in C or FORTRAN. The performance of VolpexPyMPI is compared with a traditional C based implementation of MPI. The paper examines in detail the tradeoffs of these usability focused and performance focused approaches.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"371 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115180368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"File-Access Characteristics of Data-Intensive Workflow Applications","authors":"Takeshi Shibata, SungJun Choi, K. Taura","doi":"10.1109/CCGRID.2010.77","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.77","url":null,"abstract":"This paper studies five real-world data intensive workflow applications in the fields of natural language processing, astronomy image analysis, and web data analysis. Data intensive workflows are increasingly becoming important applications for cluster and Grid environments. They open new challenges to various components of workflow execution environments including job dispatchers, schedulers, file systems, and file staging tools. Their impacts on real workloads are largely unknown. Under- standing characteristics of real-world workflow applications is a required step to promote research in this area. To this end, we analyse real-world workflow applications focusing on their file access patterns and summarize their implications to schedulers and file system/staging designs.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128027632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling the Next Generation of Scalable Clusters","authors":"W. Gropp","doi":"10.1109/CCGRID.2010.135","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.135","url":null,"abstract":"Clusters revolutionized computing by making supercomputer capabilities widely available. But one of the main drivers of that revolution, the rapid doubling of processor clock rates, ran out of steam several years ago. To maintain (or even increase) the historic rate of improvement in computing power, processor designs are rapidly increasing parallelism at all levels, including more functional units, more cores, and ways to share resources among threads. Heterogeneous designs that use more specialized processors such as GPGPUs are becoming common. The scale of high-end systems is also getting larger, with 1000-core systems becoming commonplace and systems with over 300,000 cores planned for 2011. However, the software and algorithms for these systems are still basically the same as when the cluster revolution began. Drawing on experiences with the sustained PetaFLOPS system, called Blue Waters, to be installed at Illinois in 2011, and with exploratory work into Exascale system designs, this talk will discuss some of the challenges facing the cluster community as scalability becomes increasingly important and reviews some of the developments in algorithms, programming models, and software frameworks that must complement the evolution of cluster hardware.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114235262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}