A. Benabdelkader, M. Santcroos, S. Madougou, A. V. Kampen, S. Olabarriaga
{"title":"A Provenance Approach to Trace Scientific Experiments on a Grid Infrastructure","authors":"A. Benabdelkader, M. Santcroos, S. Madougou, A. V. Kampen, S. Olabarriaga","doi":"10.1109/eScience.2011.27","DOIUrl":"https://doi.org/10.1109/eScience.2011.27","url":null,"abstract":"Large experiments on distributed infrastructures become increasingly complex to manage, in particular to trace all computations that gave origin to a piece of data or an event such as an error. The work presented in this paper describes the design and implementation of an architecture to support experiment provenance and its deployment in the concrete case of a particular e-infrastructure for biosciences. The proposed solution consists of: (a) a data provenance repository to capture scientific experiments and their execution path, (b) a software tool (crawler) that gathers, classifies, links, and stores the information collected from various sources, and (c) a set of user interfaces through which the end-user can access the provenance data, interpret the results, and trace the sources of failure. The approach is based on an OPM-compliant API, PLIER, that is flexible to support future extensions and facilitates interoperability among heterogeneous application systems.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116932704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Rynge, G. Juve, Gaurang Mehta, E. Deelman, K. Larson, B. Holzman, I. Sfiligoi, Frank Wurthwein, G. Bruce Berriman, S. Callaghan
{"title":"Experiences Using GlideinWMS and the Corral Frontend across Cyberinfrastructures","authors":"M. Rynge, G. Juve, Gaurang Mehta, E. Deelman, K. Larson, B. Holzman, I. Sfiligoi, Frank Wurthwein, G. Bruce Berriman, S. Callaghan","doi":"10.1109/ESCIENCE.2011.50","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2011.50","url":null,"abstract":"Even with Grid technologies, the main mode of access for the current High Performance Computing and High Throughput Computing infrastructures today is logging in via ssh. This mode of access locks scientists to particular machines as it is difficult to move the codes and environments between hosts. In this paper we show how switching the resource access mode to a Condor glide in-based overlay can bring together computational resources from multiple cyber infrastructures. This approach provides scientists with a computational infrastructure anchored around the familiar environment of the desktop computer. Additionally, the approach enhances the reliability of applications and workflows by automatically rerouting jobs to functioning infrastructures. Two different science applications were used to demonstrate applicability, one from the field of astronomy and the other one from earth sciences. We demonstrate that a desktop computer is viable as a submit host and central manager for these kind of glide in overlays. However, issues of ease of use and security need to be considered.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115523501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"iGrid: Interactive Grid","authors":"M. Meoni","doi":"10.1109/eScience.2011.34","DOIUrl":"https://doi.org/10.1109/eScience.2011.34","url":null,"abstract":"Grid computing is traditionally applied for batch jobs in the scientific and academic computing. This is also the case at WLCG, the global computing infrastructure providing the production and analysis environments for the LHC experiments at CERN. In this paper we envision the next generation Grid computing systems to support High Energy Physics (HEP) online sessions. We refine a resource management framework for enabling on-demand private virtual clusters on the AliEn Grid middleware and run applications on an online computing environment. We describe the software components distributed among the submission node and the site front-end and execution nodes. Our experiment evaluates the scalability of the agents deployed at the execution nodes and measures the performance of PROOF - the de-facto standard software for HEP data analysis - on our Grid environment rather than on a local cluster. Because PROOF generates time-sensitive I/O and requires to handle potentially thousands of network connections, we describe a multiplexed I/O approach as the ideal solution for scalable and high-performing proxy services.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128580254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building a Semantic Knowledge-base for Painting Conservators","authors":"J. Hunter, Suleiman Odat","doi":"10.1109/eScience.2011.32","DOIUrl":"https://doi.org/10.1109/eScience.2011.32","url":null,"abstract":"The Twentieth Century Paint project is a collaboration between the Asia Pacific Twentieth Century Conservation Art Research Network (APTCCARN) and the eResearch Lab at the University of Queens land. It is a collaborative effort to explore the preservation of twentieth-century paintings in Asia and the Pacific. One of the key objectives is to establish an online knowledge-base that will provide conservators with access to integrated, structured information and a portfolio of experiments and case studies that document the different causes of paint degradation and the optimum conservation treatments. This paper describes the knowledge-base and the associated ontology and services developed by the eResearch Lab in collaboration with APTCCARN. This work provides a flexible but robust framework that will enable future expansion of the knowledge base through both harvesting of structured data and collaborative input by domain experts.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124550110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Scalable Architecture for e-Science Data Management","authors":"S. Toor, M. Sabesan, S. Holmgren, T. Risch","doi":"10.1109/eScience.2011.37","DOIUrl":"https://doi.org/10.1109/eScience.2011.37","url":null,"abstract":"The massive increase in the size of the data provided by e-Science applications requires not only to increase the capabilities of resources, but also to design new strategies for efficient utilization of already available resources. In this paper we present a scalable approach to extend a file-oriented storage system, Chelonia, with geographically distributed databases defined by a generic database schema. The database schema is able to model the data from typical e-Science applications. The system includes web service query service allowing e-Science applications to query the required data.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114822149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a Parallel Hybrid Direct/Iterative Solver for CFD Problems","authors":"J. Thies, F. Wubs","doi":"10.1109/eScience.2011.60","DOIUrl":"https://doi.org/10.1109/eScience.2011.60","url":null,"abstract":"We discuss the parallel implementation of a hybrid direct/iterative solver for a special class of saddle point matrices arising from the discretization of the steady Navier-Stokes equations on an Arakawa C-grid, the F-matrices. The two-level method described here has the following properties: (i) it is very robust, even at comparatively high Reynolds Numbers, (ii) a single parameter controls fill and convergence, making the method straightforward to use, (iii) the convergence rate is independent of the number of unknowns, (iv) it can be implemented on distributed memory machines in a natural way, (v) the matrix on the second level has the same structure and numerical properties as the original problem, so the method can be applied recursively. The implementation focusses on generality, modularity, code reuse and recursiveness. The solver is implemented using building blocks of the Trilinos libraries. We show its performance on a parallel computer for the Navier-Stokes equations.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121993108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kary A. C. S. Ocaña, Daniel de Oliveira, Jonas Dias, Eduardo S. Ogasawara, M. Mattoso
{"title":"Optimizing Phylogenetic Analysis Using SciHmm Cloud-based Scientific Workflow","authors":"Kary A. C. S. Ocaña, Daniel de Oliveira, Jonas Dias, Eduardo S. Ogasawara, M. Mattoso","doi":"10.1109/eScience.2011.17","DOIUrl":"https://doi.org/10.1109/eScience.2011.17","url":null,"abstract":"Phylogenetic analysis and multiple sequence alignment (MSA) are closely related bioinformatics fields. Phylogenetic analysis makes extensive use of MSA in the construction of phylogenetic trees, which are used to infer the evolutionary relationships between homologous genes. These bioinformatics experiments are usually modeled as scientific workflows. There are many alternative workflows that use different MSA methods to conduct phylogenetic analysis and each one can produce MSA with different quality. Scientists have to explore which MSA method is the most suitable for their experiments. However, workflows for phylogenetic analysis are both computational and data intensive and they may run sequentially during weeks. Although there any many approaches that parallelize these workflows, exploring all MSA methods many become a burden and expensive task. If scientists know the most adequate MSA method a priori, it would spare time and money. To optimize the phylogenetic analysis workflow, we propose in this paper SciHmm, a bioinformatics scientific workflow based in profile hidden Markov models (pHMMs) that aims at determining the most suitable MSA method for a phylogenetic analysis prior than executing the phylogenetic workflow. SciHmm is also executed in parallel in a cloud environment using SciCumulus middleware. The results demonstrated that optimizing a phylogenetic analysis using SciHmm considerably reduce the total execution time of phylogenetic analysis (up to 80%). This optimization also demonstrates that the biological results presented more quality. In addition, the parallel execution of SciHmm demonstrates that this kind of bioinformatics workflow is suitable to be executed in the cloud.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129780930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Truskinger, Hao-Fan Yang, J. Wimmer, Jinglan Zhang, I. Williamson, P. Roe
{"title":"Large Scale Participatory Acoustic Sensor Data Analysis: Tools and Reputation Models to Enhance Effectiveness","authors":"A. Truskinger, Hao-Fan Yang, J. Wimmer, Jinglan Zhang, I. Williamson, P. Roe","doi":"10.1109/eScience.2011.29","DOIUrl":"https://doi.org/10.1109/eScience.2011.29","url":null,"abstract":"Acoustic sensors play an important role in augmenting the traditional biodiversity monitoring activities carried out by ecologists and conservation biologists. With this ability however comes the burden of analysing large volumes of complex acoustic data. Given the complexity of acoustic sensor data, fully automated analysis for a wide range of species is still a significant challenge. This research investigates the use of citizen scientists to analyse large volumes of environmental acoustic data in order to identify bird species. Specifically, it investigates ways in which the efficiency of a user can be improved through the use of species identification tools and the use of reputation models to predict the accuracy of users with unidentified skill levels. Initial experimental results are reported.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130866411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoquan Su, Yongzheng Ma, Hongwei Yang, Xingzhi Chang, Kai Nan, Jian Xu, K. Ning
{"title":"An Open-source Collaboration Environment for Metagenomics Research","authors":"Xiaoquan Su, Yongzheng Ma, Hongwei Yang, Xingzhi Chang, Kai Nan, Jian Xu, K. Ning","doi":"10.1109/eScience.2011.10","DOIUrl":"https://doi.org/10.1109/eScience.2011.10","url":null,"abstract":"By analyzing metagenomic data from microbial communities, the taxonomical and functional component of hundreds of previously unknown microbial communities have been elucidated in the past few years. However, metagenomic data analyses are both data- and computation-intensive, which require extensive computational power. Most of the current metagenomic data analysis software were designed to be used on a single PC (Personal Computer), which could not match with the fast increasing number of large metagenomic projects' computational requirements. Therefore, advanced computational environment has to be developed to cope with such needs. In this paper, we proposed an open-source collaboration environment for metagenomic data analysis, which enabled the parallel analysis of multiple metagenomic datasets at the same time. By using this collaboration environment, researchers from different locations could submit their data, collaboratively configure the analysis pipeline, and perform data analysis efficiently. As of now, more than 30 metagenomic data analysis projects have already been conducted based on this environment.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121064845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CarbBuilder: An Adjustable Tool for Building 3D Molecular Structures of Carbohydrates for Molecular Simulation","authors":"M. Kuttel, Y. Mao, G. Widmalm, M. Lundborg","doi":"10.1109/eScience.2011.61","DOIUrl":"https://doi.org/10.1109/eScience.2011.61","url":null,"abstract":"CarbBuilder is a software tool for building 3D structures of carbohydrates, which are the most structurally varied of all molecular classes. CarbBuilder was designed with the dual aims of portability and adaptability, using an iterative software development approach. CarbBuilder employs a simple algorithm, using heuristics based upon experimental data to convert a primary structure description of a carbohydrate molecule into a three-dimensional structure file. This straightforward approach means that CarbBuilder can be easily adapted: users can add additional monosaccharide building blocks or alter the conformational defaults to suit specific requirements. The output carbohydrate structure can be used for subsequent molecular simulation investigations. CarbBuilder is freely available and portable: it is a text-based stand-alone program that can run on Windows, Linux and MacOS X systems without installation.","PeriodicalId":299889,"journal":{"name":"2011 IEEE Seventh International Conference on eScience","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133729945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}