Michael Matheny, S. Schlachter, L. M. Crouse, E. T. Kimmel, Trilce Estrada, Marcel Schumann, R. Armen, G. Zoppetti, M. Taufer
{"title":"ExSciTecH: Expanding volunteer computing to Explore Science, Technology, and Health","authors":"Michael Matheny, S. Schlachter, L. M. Crouse, E. T. Kimmel, Trilce Estrada, Marcel Schumann, R. Armen, G. Zoppetti, M. Taufer","doi":"10.1109/eScience.2012.6404451","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404451","url":null,"abstract":"This paper presents ExSciTecH, an NSF-funded project deploying volunteer computing (VC) systems to Explore Science, Tecenology, and Health. ExSciTecH aims at radically transforming VC systems and the volunteer's experience. To pursue this goal, ExSciTecH integrates and uses gameplay environments into BOINC, a well-known VC middleware, to involve the volunteers not only for simply donating idle cycles but also for actively participating in scientific discovery, i.e., generating new simulations side by side with the scientists. More specifically, ExSciTecH plugs into the BOINC framework extending it with two main gaming components, i.e., a learning component that includes a suite of games for training users on relevant biochemical concepts, and an engaging component that includes a suite of games to engage volunteers in drug design and scientific discovery. We assessed the impact of a first implementation of the learning game on a group of students at the University of Delaware. Our tests clearly show how ExSciTecH can generate higher levels of enthusiasm than more traditional learning tools in our students.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"8 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84163478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image retrieval in the unstructured data management system AUDR","authors":"Junwu Luo, B. Lang, Chao Tian, Danchen Zhang","doi":"10.1109/eScience.2012.6404474","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404474","url":null,"abstract":"The explosive growth of image data leads to severe challenges to the traditional image retrieval methods. In order to manage massive images more accurate and efficient, this paper firstly proposes a scalable architecture for image retrieval based on a uniform data model and makes this function a sub-engine of AUDR, an advanced unstructured data management system, which can simultaneously manage several kinds of unstructured data including image, video, audio and text. The paper then proposes a new image retrieval algorithm, which incorporates rich visual features and two text models for multi-modal retrieval. Experiments on both ImageNet dataset and ImageCLEF medical dataset show that our proposed architecture and the new retrieval algorithm are appropriate for efficient management of massive image.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"1 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81342863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of protein solubility in E. coli","authors":"T. Samak, D. Gunter, Zhong Wang","doi":"10.1109/eScience.2012.6404416","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404416","url":null,"abstract":"Gene synthesis is a key step to convert digitally predicted proteins to functional proteins. However, it is a relatively expensive and labor-intensive process. About 30-50% of the synthesized proteins are not soluble, thereby further reduces the efficacy of gene synthesis as a method for protein function characterization. Solubility prediction from primary protein sequences holds the promise to dramatically reduce the cost of gene synthesis. This work presents a framework that creates models of solubility from sequence information. From the primary protein sequences of the genes to be synthesized, sequence features can be used to build computational models for solubility. This way, biologists can focus the effort on synthesizing genes that are highly likely to generate soluble proteins. We have developed a framework that employs several machine learning algorithms to model protein solubility. The framework is used to predict protein solubility in the Escherichia coli expression system. The analysis is performed on over 1,600 quantified proteins. The approach successfully predicted the solubility with more than 80% accuracy, and enabled in depth analysis of the most important features affecting solubility. The analysis pipeline is general and can be applied to any set of sequence features to predict any binary measure. The framework also provides the biologist with a comprehensive comparison between different learning algorithms, and insightful feature analysis.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"10 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75740407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeremy Goecks, The Galaxy Team, A. Nekrutenko, James Taylor
{"title":"Lessons learned from Galaxy, a Web-based platform for high-throughput genomic analyses","authors":"Jeremy Goecks, The Galaxy Team, A. Nekrutenko, James Taylor","doi":"10.1109/ESCIENCE.2012.6404442","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404442","url":null,"abstract":"High throughput sequencing assays have given rise to the field of genomics and transformed biomedical research into a computational science. Due to the large size of genomics datasets, high-performance computing is essential for analysis. Galaxy (http://galaxyproject.org) is a popular Web-based platform that can be used for all facets of genomic analyses, including data retrieval and integration, multi-step analysis, repeated analyses via workflows, visualization, collaboration, and publication. This paper describes Galaxy and discusses four lessons learned from the development of Galaxy. First, Galaxy uses open, extensible frameworks so that it can be adapted to new technologies as they become available. Second, by leveraging Web technologies, Galaxy makes genomics tools accessible to everyone and provides a common platform for collaboration. Third, Galaxy fosters community amongst both developers and users and encourages each community to adapt and extend Galaxy to meet their needs. Finally, Galaxy software development and genomic research are closely coupled, and challenges encountered during genomic research drive Galaxy development.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"51 4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91014513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling scientific data sharing and re-use","authors":"B. Minsker, T. Wietsma","doi":"10.1109/eScience.2012.6404475","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404475","url":null,"abstract":"Research data sharing is one of the key challenges in the e-science era. IT technologies facilitate an enhanced management and sharing of research data. It is crucial to understand the current status of research data sharing in order to facilitate enhanced data sharing in the future. In this study, a conceptual model has been developed to characterize the process of data sharing and the factors which give rise to variations in data re-use. The study goes beyond a solely technical analysis and includes also psychological, social, organizational, legal and political components. The model was developed based on the literature and 21 face to face interviews with research, funding, data centre and publishing experts. It was validated by both a vigorous workshop and a further 55 structured telephone interviews. The overall model identifies sub-models of process, of context, and of drivers, barriers and enablers. These provide a comprehensive description of the factors that enable or inhibit the sharing of research data. They affect whether data are shared, how they are shared, and how successfully they are shared. Implementing the enablers will help the research community overcome the barriers to data re-use to facilitate future e-science endeavors.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"14 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83723999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Crompton, B. Matthews, Erica Y. Yang, C. Neylon, S. Coles
{"title":"Collaborative information management in scientific research processes","authors":"S. Crompton, B. Matthews, Erica Y. Yang, C. Neylon, S. Coles","doi":"10.1109/ESCIENCE.2012.6404478","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404478","url":null,"abstract":"Research is an incremental process that both generates and consumes diverse artifacts over its lifetime. A typical research lifecycle may involve creating experimental or observational data using multiple facilities or instruments; refining raw data into derived data to test hypotheses; publishing and presenting the findings in various formats. Each stage of this process commonly involves support systems with independent management; this however hinders e-scholarship as human mediation is required to track and access related research outputs. In this paper, we describe a collaborative research information management infrastructure based on STFC facilities. The pilot system uses the InteRCom peer-to-peer protocol to propagate typed links between digital contents spread across repositories. The resultant linked web of data offers a simple but versatile solution to the tracking of research outputs in context, as these semantically annotated links form a graph of citation and provenance which can be analyzed, traversed or aggregated according to the link resource or property of interest.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"34 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88036806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Open Social based group access control framework for e-Science data infrastructure","authors":"Hui Zhang, Wenjun Wu, ZhenAn Li","doi":"10.1109/eScience.2012.6404488","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404488","url":null,"abstract":"In an e-Science data infrastructure, access control is a vital component to facilitate the management of the collective data and computing resources shared by researchers from geographically distributed locations. But conventional virtual organization based access control frameworks are not suitable for self-organizing, ad-hoc and opportunistic scientific collaborations, in which scientists can easily set up group-oriented authorization rules across the administrative domains. Using the emerging OAuth2.0 protocol, this paper introduces a novel Open Social based access control framework to support ad-hoc team formation and user-controlled resource sharing. Our experiences with development of the framework in e-Science data infrastructure projects demonstrate that the proposed framework is a very promising approach to resource sharing in cross-domain e-science environments.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"5 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91002914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling large genomic data transfers using nation-wide and international dynamic lightpaths","authors":"J. Bot, M. D. Vos, S. Boele, M. Reinders, J. Kok","doi":"10.1109/eScience.2012.6404458","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404458","url":null,"abstract":"The recent advances made in high throughput genomic sequencing allow researchers to accurately determine the genetic make-up of an individual. Sharing this data across research institutes has proven to be challenging as the amount of data and available bandwidth cause large delays. Here, we present a network of dynamic lightpaths dedicated to the life sciences which connects research groups within the Netherlands to each other, to compute and storage providers and to commercial partners.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"53 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76262478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Calibration of watershed models using cloud computing","authors":"M. Humphrey, N. Beekwilder, J. Goodall, M. Ercan","doi":"10.1109/ESCIENCE.2012.6404420","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404420","url":null,"abstract":"Understanding hydrologic systems at the scale of large watersheds and river basins is critically important to society when faced with extreme events, such as floods and droughts, or with concerns about water quality. A critical requirement of watershed modeling is model calibration, in which the computational model's parameters are varied during a search algorithm in order to find the best match against physically-observed phenomena such as streamflow. Because it is generally performed on a laptop computer, this calibration phase can be very time-consuming, significantly limiting the ability of a hydrologist to experiment with different models. In this paper, we describe our system for watershed model calibration using cloud computing, specifically Microsoft Windows Azure. With a representative watershed model whose calibration takes 11.4 hours on a commodity laptop, our cloud-based system calibrates the watershed model in 43.32 minutes using 16 cloud cores (15.78x speedup), 11.76 minutes using 64 cloud cores (58.13x speedup), and 5.03 minutes using 256 cloud cores (135.89x speedup). We believe that such speed-ups offer the potential toward real-time interactive model creation with continuous calibration, ushering in a new paradigm for watershed modeling.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"3 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73145695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Stokes-Rees, D. O'Donovan, Peter Doherty, Meghan Porter-Mahoney, P. Śliż
{"title":"An integrated science portal for collaborative compute and data intensive protein structure studies","authors":"I. Stokes-Rees, D. O'Donovan, Peter Doherty, Meghan Porter-Mahoney, P. Śliż","doi":"10.1109/ESCIENCE.2012.6404425","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404425","url":null,"abstract":"The SBGrid Science Portal provides multi-modal access to computational infrastructure, data storage, and data analysis tools for the structural biology community. It incorporates features not previously seen in cyberinfrastructure science gateways. It enables researchers to securely share a computational study area, including large volumes of data and active computational workflows. A rich identity management system has been developed that simplifies federated access to US national cyberinfrastructure, distributed data storage, and high performance file transfer tools. It integrates components from the Virtual Data Toolkit, Condor, glideinWMS, the Globus Toolkit and Globus Online, the FreeIPA identity management system, Apache web server, and the Django web framework.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"22 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74037443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}