{"title":"Simultaneous scheduling of replication and computation for bioinformatics applications on the grid","authors":"F. Desprez, Antoine Vernois","doi":"10.1109/CLADE.2005.1520903","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520903","url":null,"abstract":"One of the first motivations of using grids comes from applications managing large data sets infield such as high energy physics or life sciences. To improve the global throughput of software environments, replicas are usually put at wisely selected sites. Moreover, computation requests have to be scheduled among the available resources. To get the best performance, scheduling and data replication have to be tightly coupled. However, there are few approaches that provide this coupling. This paper presents an algorithm that combines data management and scheduling using a steady-state approach. Our theoretical results are validated using simulation and logs from a large life science application (ACI GRID GriPPS).","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"71 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134545546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structure, sharing and preservation of scientific experiment data","authors":"S. Pallickara, Beth Plale, S. Jensen, Yiming Sun","doi":"10.1109/CLADE.2005.1520912","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520912","url":null,"abstract":"In mesoscale meteorology, the quantity of information has increased significantly due to sophisticated data distribution schemes combined with developments in sensors and instruments capable of monitoring the lower several kilometers of the atmosphere at higher levels of resolution. This paper introduces myLEAD, a personalized information management tool for geoscience users. MyLEAD eases the data and information overload on the scientist by providing explicit solutions to the problems of structure, sharing, and preservation. This paper describes strategies within the myLEAD system to personalize data product and representation which ultimately leads to personalized workspaces and collaborative environments. We also include experimental results from some of the experiments that we conducted.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128459052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simplifying FusionGrid security","authors":"J. Burruss, T. Fredian, M. Thompson","doi":"10.1109/CLADE.2005.1520908","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520908","url":null,"abstract":"Since inception in 2001, FusionGrid developers have worked to secure computational resources in a multi-institutional environment with geographically dispersed users. Recent improvements to grid security have streamlined the usage and administration of resources. More than simply increasing security, these improvements have made FusionGrid security easier for resource administrators and the fusion scientists that use FusionGrid, allowing them to get work done with minimal inconvenience. Improvements in authentication, authorization, and data handling have been welcomed by fusion scientists and promise to ease the burden of adding new resources to the grid.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127924965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A composable data management architecture for scientific applications","authors":"Yu Ma, R. Bramley","doi":"10.1109/CLADE.2005.1520897","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520897","url":null,"abstract":"With ever increasing computational power, data management has become crucial for scientific applications today. Most on-going research efforts are dedicated to generalizing universal requirements and schemas for building generic data management systems. By closely studying a broad range of scientific applications including X-ray crystallography, radiation therapy, automated photometry and comparative genomics, a composable data management architecture for scientific applications is presented, which instead aims at providing a set of orthogonal components for each scientific application to quickly and easily construct its customized data management system. The prototype architecture is described in detail, and the component interfaces are defined in SIDL (Scientific Interface Definition Language). Results of building customized data management systems for a variety of scientific applications are also discussed.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122629081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Disz, Michael Kubal, R. Olson, R. Overbeek, R. Stevens
{"title":"Challenges in large scale distributed computing: bioinformatics","authors":"T. Disz, Michael Kubal, R. Olson, R. Overbeek, R. Stevens","doi":"10.1109/CLADE.2005.1520902","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520902","url":null,"abstract":"The amount of genomic data available for study is increasing at a rate similar to that of Moore's law. This deluge of data is challenging bioinformaticians to develop newer, faster and better algorithms for analysis and examination of this data. The growing availability of large scale computing grids coupled with high-performance networking is challenging computer scientists to develop better, faster methods of exploiting parallelism in these biological computations and deploying them across computing grids. In this paper, we describe two computations that are required to be run frequently and which require large amounts of computing resource to complete in a reasonable time. The data for these computations are very large and the sequential computational time can exceed thousands of hours. We show the importance and relevance of these computations, the nature of the data and parallelism and we show how we are meeting the challenge of efficiently distributing and managing these computations in the SEED project.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"08 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127544967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumithra Dhandayuthapani, I. Banicescu, R. Cariño, Eric Hansen, J. P. Pabico, M. Rashid
{"title":"Automatic selection of loop scheduling algorithms using reinforcement learning","authors":"Sumithra Dhandayuthapani, I. Banicescu, R. Cariño, Eric Hansen, J. P. Pabico, M. Rashid","doi":"10.1109/CLADE.2005.1520907","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520907","url":null,"abstract":"This paper presents the design and implementation of a reinforcement learning agent that automatically selects appropriate loop scheduling algorithms for parallel loops embedded in time-stepping scientific applications executing on clusters. There may be a number of such loops in an application, and the loops may have different load balancing requirements. Further, loop characteristics may also change as the application progresses. Following a model-free learning approach, the learning agent assigned to a loop selects from a library the best scheduling algorithm for the loop during the lifetime of the application. The utility of the learning agent is demonstrated by its successful integration into the simulation of wave packets - an application arising from quantum mechanics. Results of statistical analysis using pairwise comparison of means on the running time of the simulation with and without the learning agent validate the effectiveness of the agent in improving the parallel performance of the simulation.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133562262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GISolve: a grid-based problem solving environment for computationally intensive geographic information analysis","authors":"Shaowen Wang, M. Armstrong, J. Ni, Yan Y. Liu","doi":"10.1109/CLADE.2005.1520892","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520892","url":null,"abstract":"The purpose of this paper is to demonstrate the design and implementation of GISolve - a grid-based problem solving environment for computationally intensive geographic information analysis based on geo-middleware. The geo-middleware resides between existing grid middleware and geographic information analysis applications to manage heterogeneous and dynamic resources on behalf of analysis applications. At the same time, GISolve provides adaptive domain decomposition solutions to parallel geographic information analysis applications. Based on these domain decomposition solutions, GISolve also schedules distributed tasks and manages data transfers. In GISolve, these capabilities are designed as grid services that are compliant with the open grid service architecture (OGSA) and are implemented using grid portal technologies. The GISolve implementation is illustrated based on a case study of a computationally intensive spatial statistic - [G*/sub i/(d)] that is used to assess spatial dependence among geographically distributional observations.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123541125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative e-science architecture for Reaction Kinetics research community","authors":"T. Pham, L. Lau, P. Dew, M. Pilling","doi":"10.1109/CLADE.2005.1520893","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520893","url":null,"abstract":"This paper presents a novel collaborative e-science architecture (CeSA) to address two challenging issues in e-science that arise from the management of heterogeneous distributed environments: (i) how to provide individual scientists an integrated environment to collaborate with each other in distributed, loosely coupled research communities where each member might be using a disparate range of tools; and (ii) how to provide easy access to a range of computationally intensive resources from a desktop. The Reaction Kinetics research community was used to capture the requirements and in the evaluation of the proposed architecture. The result demonstrated the feasibility of the approach and the potential benefits of the CeSA.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126645243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reverb: middleware for distributed application forensics","authors":"Patrick M. Widener","doi":"10.1109/CLADE.2005.1520906","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520906","url":null,"abstract":"We observe the recent research trend toward large-scale, component-based distributed systems that are dynamically configurable or extensible in response to changing execution environments or end-user needs. Regardless of whether these configuration changes happen automatically through predefined adaptation or self-management methods or in response to explicit user interaction, they can jeopardize the integrity of application components. Moreover, they can cause unexpected effects in system performance or even lead to disputes about middleware or application providers' responsibilities for failures experienced by end users. This paper introduces Reverb, a set of middleware abstractions and mechanisms that can be used to: (1) audit configuration actions; (2) impose controls on permissible actions; and (3) control which principals are permitted to carry out configurations. To evaluate Reverb, it has been integrated into middleware used in the high performance domain. The intent of this integration is to not only demonstrate its viability and utility, but also to show that Reverb-based configuration control has little effect on the performance of the distributed applications or middleware that use it. Experimental results attained with Reverb-enabled middleware used with resource-constrained pervasive applications demonstrate the small performance impact of Reverb's rich new functionality.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128412542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A user-transparent recoverable file system for distributed computing environment","authors":"H. Kim, H. Yeom","doi":"10.1109/CLADE.2005.1520898","DOIUrl":"https://doi.org/10.1109/CLADE.2005.1520898","url":null,"abstract":"In a distributed computing environment, particularly grid, fault-tolerance is one of the core functionalities the system should provide. MPICH-GF is such a resilient system designed to resist external or internal failures, especially for message passing applications in the grid environment. But it does not stand the loss of a valuable resource: files. In a normal case, users open files and write data into them in an asynchronous manner, and checkpointing is initiated with no regard to the state of the context of the process. Therefore, the checkpointing system should automatically recognize the running process and protect the open files transparently. We have implemented a recoverable file system, named ReFS, which is incorporated into our fault-tolerant system MPICH-GF. ReFS is a versioning-like file system. ReFS provides middleware libraries with the system call interface to protect specific files at a given time. This prevents applications from processing their jobs with corrupted data and resulting in incorrect results in case of failures. We have focused not only on the reliability of the system but also on the reduction of inevitable overheads. This paper describes the design and implementation of ReFS and justifies the validity of the behavior of ReFS. We have developed ReFS on Linux, based on Ext2.","PeriodicalId":330715,"journal":{"name":"CLADE 2005. Proceedings Challenges of Large Applications in Distributed Environments, 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133363825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}