{"title":"Exploiting Lustre File Joining for Effective Collective IO","authors":"Weikuan Yu, J. Vetter, S. Canon, Song Jiang","doi":"10.1109/CCGRID.2007.51","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.51","url":null,"abstract":"Lustre is a parallel file system that presents high aggregated IO bandwidth by striping file extents across many storage devices. However, our experiments indicate excessively wide striping can cause performance degradation. Lustre supports an innovative file joining feature that joins files in place. To mitigate striping overhead and benefit collective IO, we propose two techniques: split writing and hierarchical striping. In split writing, a file is created as separate subfiles, each of which is striped to only a few storage devices. They are joined as a single file at the file close time. Hierarchical striping builds on top of split writing and orchestrates the span of subfiles in a hierarchical manner to avoid overlapping and achieve the appropriate coverage of storage devices. Together, these techniques can avoid the overhead associated with large stripe width, while still being able to combine bandwidth available from many storage devices. We have prototyped these techniques in the ROMIO implementation of MPI-IO. Experimental results indicate that split writing and hierarchical striping can significantly improve the performance of Lustre collective IO in terms of both data transfer and management operations. On a Lustre file system configured with 46 object storage targets, our implementation improves collective write performance of a 16-process job by as much as 220%.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114891699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Brasileiro, E. Araújo, W. Voorsluys, Milena P. M. Oliveira, F. Figueiredo
{"title":"Bridging the High Performance Computing Gap: the OurGrid Experience","authors":"F. Brasileiro, E. Araújo, W. Voorsluys, Milena P. M. Oliveira, F. Figueiredo","doi":"10.1109/CCGRID.2007.28","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.28","url":null,"abstract":"High performance computing is currently not affordable for those users that cannot rely on having a highly qualified computing support team. To cater for these users' needs we have proposed, implemented and deployed OurGrid. OurGrid is a peer-to-peer grid middleware that supports the automatic creation of large computational grids for the execution of embarrassingly parallel applications. It has been used to support the OurGrid Community - a public free-to-join grid that is in production since December 2004. In this paper we show how the OurGrid Community has been used to support the execution of a number of applications. Further we discuss the main benefits brought up by the system and the difficulties that have been faced by the system developers and the users and managers of the OurGrid Community.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"57 6 Suppl 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133350494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Build-and-Test Workloads for Grid Middleware: Problem, Analysis, and Applications","authors":"A. Iosup, D. Epema","doi":"10.1109/CCGRID.2007.29","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.29","url":null,"abstract":"The Grid promise is starting to materialize today: large- scale multi-site infrastructures have grown to assist the work of scientists from all around the world. This tremendous growth can be sustained and continued only through a higher quality of the middleware, in terms of deployability and of correct functionality. A potential solution to this problem is the adoption of industry practices regarding middleware building and testing. However, it is unclear what good build-and-test environments for grid middleware should look like, and how to use them efficiently. In this work we address both these problems. First, we study the characteristics of the NMI build-and-test environment, which handles millions of testing tasks annually, for major Grid middleware such as Condor, Globus, VDT, and gLite. Through the analysis of a system-wide trace covering the past two years we find the main characteristics of the workload, as well as the performance of the system under load. Second, we propose mechanisms for more efficient test management and operation, and for resource provisioning and evaluation. Notably, we propose a generic test optimization technique that reduces the test time by 95%, while achieving 93% of the maximum accuracy, under real conditions.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127766481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. L. Vázquez-Poletti, E. Huedo, R. Montero, I. Llorente
{"title":"Workflow Management in a Protein Clustering Application","authors":"J. L. Vázquez-Poletti, E. Huedo, R. Montero, I. Llorente","doi":"10.1109/CCGRID.2007.122","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.122","url":null,"abstract":"Bioinformatics is demanding more computational resources day after day. The problems proposed by this area are growing in such complexity that traditional computing systems are not able to face them. For solving complex problems which can be divided in tasks with dependencies, a workflow management system must be employed. In this paper, we introduce the use of the workflow management of the GridWay metascheduler for running a Bioinformatics application which implements a complex algorithm performing protein clustering in order to obtain non-redundant protein databases. The use of a general purpose meta-scheduling system will provide the application the fault-tolerance and advance scheduling capabilities needed to execute on a highly dynamic, heterogeneous and faulty environment. The execution results on a production Grid (the EGEE infrastructure) shows the dramatic impact of remote queue waiting times on the application performance; and the critical need of efficient re-scheduling capabilities.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128961049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parameter Sweeps for Functional MRI Research in the \"Virtual Laboratory for e-Science\" Project","authors":"S. Olabarriaga, A. Nederveen, B. O. Nualláin","doi":"10.1109/CCGRID.2007.82","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.82","url":null,"abstract":"Image analysis is an important component of neuroscience research. The ICT infrastructure and technical knowledge needed to perform (large scale) neuroimaging studies, however, is often not available to the neuroscientists. The \"virtual laboratory for e-sciences\" project provides an advanced (grid) infrastructure offering data and computing services to researchers from several application domains. In this paper we describe how this infrastructure is used in the context of functional magnetic resonance imaging (MRI) research, which is devoted to the study of brain activity due to stimulation. Our experience in using a generic application (Nimrod, Monash University, Australia) to manage large parameter sweep experiments is presented. These experiments were performed to investigate the effect of one parameter (delay in the hemodynamic response function) in the analysis result, but also to evaluate the available infrastructure (grid resources and experiment management services). Initial results indicate that it is feasible and simple to perform large image analysis experiments using Nimrod once the environment has been properly prepared.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128436992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collective Interfaces for Distributed Components","authors":"F. Baude, D. Caromel, L. Henrio, M. Morel","doi":"10.1109/CCGRID.2007.32","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.32","url":null,"abstract":"We propose to address collective communications in distributed components through collective interfaces. Collective interfaces handle data distribution, parallelism and synchronization, and they expose collective behaviors in the definition of components. We show, as an illustration, that collective interfaces allow the encoding of SPMD programming in a better structured and less error prone way. We verify the scalability and performance of collective interfaces in an experiment on up to 100 machines.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122012751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intelligent Scheduling and Replication in Datagrids: a Synergistic Approach","authors":"Ali Elghirani, Riky Subrata, Albert Y. Zomaya","doi":"10.1109/CCGRID.2007.65","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.65","url":null,"abstract":"In large-scale data-intensive applications data plays a pivotal role in the execution of these applications, and data transfer is the primary cause of job execution delay. In environments such as the data grids with the need to execute jobs requiring large amounts of data, a smart collaborative environment between the scheduling and data management services to achieve a synergistic effect on the performance of the grid becomes essential. This paper presents an intelligent data grid framework where job scheduling and data and replica management are coupled to provide an integrated environment for efficient access to data and job scheduling. The data management service predicts and estimates the appropriate locations of replica and proactively replicates the datasets in these locations while the intelligent Tabu Search based scheduler incorporating information about the datasets dispatches the jobs to the sites guaranteeing minimum job execution time and better overall system utilization. Evaluation of the framework shows significant improvement in the performance of the grid and job execution time.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127307291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Analysis and Runtime Steering of Dynamic Workflows in the ASKALON Grid Environment","authors":"R. Prodan","doi":"10.1109/CCGRID.2007.76","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.76","url":null,"abstract":"We present a new distributed performance analysis service of the ASKALON integrated Grid environment for computing runtime overheads of dynamic workflows in realtime based on event correlation techniques. We illustrate a formal method to express precise overhead correlation rules, including several performance contracts as quality of service parameters based on fuzzy logic to be enforced in dynamic environments though rescheduling, various runtime optimisations, and steering techniques. We demonstrate experimental results for two real applications from material chemistry and graphics rendering domains.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125158517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PACE: Augmenting Personal Mobile Devices with Scalable Computing","authors":"Xun Luo","doi":"10.1109/CCGRID.2007.81","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.81","url":null,"abstract":"In the near future personal mobile devices will become ingredients of the grid. Foreseeing this, the thesis work aims at leveraging scalable computing techniques to enhance mobile devices so that their interoperability with peers and grid infrastructure will be improved, and their unique resources could be better contributed to the grid. A middleware framework named personal augmented computing environment (PACE) is proposed, the research progress is reported, and future plan is introduced. The main contributions include 1) investigation of collaborative visualization using display clusters composed by mobile devices, 2) exploration of context- aware methods for mobile devices to achieve efficient utilization of peer and grid resources, and 3) development of novel scalable human-computer interaction techniques through seamless integration of mobile devices and environmental infrastructures.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"12 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133050967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Service-Oriented System to Support Data Integration on Data Grids","authors":"A. Gounaris, C. Comito, R. Sakellariou, D. Talia","doi":"10.1109/CCGRID.2007.12","DOIUrl":"https://doi.org/10.1109/CCGRID.2007.12","url":null,"abstract":"Data Grids provide transparent access to heterogeneous and autonomous data resources. The main contribution of this paper is the presentation of a data sharing system that (i) is tailored to data grids, (ii) supports well established and widely spread relational DBMSs, and (iii) adopts a hybrid architecture by relying on a peer model for query reformulation for retrieving semantically equivalent expressions, and on a wrapper-mediator integration model for accessing and querying distributed data sources. The system builds upon the infrastructure provided by the OGSA-DQP distributed query processor and the XMAP query reformulation algorithm. The paper discusses the implementation methodology, and also presents empirical evaluation results.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116292282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}