{"title":"Scientific Workflow Interchanging through Patterns: Reversals and Lessons Learned","authors":"Bruno F. Bastos, Regina M. M. Braga, A. A. Gomes","doi":"10.1109/eScience.2015.26","DOIUrl":"https://doi.org/10.1109/eScience.2015.26","url":null,"abstract":"Scientific workflows are used for dealing with complex problems in different e-science domains. These workflows are modeled and executed using Scientific Workflow Management Systems (SWfMSs). Generally, SWfMSs provide their own Workflow Specification Language (WfSL), and this is a challenge considering the possibility of interchanging workflow specifications between different SWfMSs. Nevertheless, the reuse of workflows gains growing importance as it helps with fostering the collaboration and cross-fertilization across different research groups. This paper presents a research proposal, including its mishaps and assimilations, on the use of workflow patterns combined with software architecture concepts to capture the key semantics expressed in scientific workflows specified in different WfSLs and to allow the interchanging of these specifications between different SWfMSs. This paper also shows how our findings based on real world specifications led us to reformulate our initial proposal and discuss the new results.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"34 1","pages":"557-564"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75511419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shape Analysis Using the Spectral Graph Wavelet Transform","authors":"J. Leandro, R. M. C. Junior, R. Feris","doi":"10.1109/eScience.2013.45","DOIUrl":"https://doi.org/10.1109/eScience.2013.45","url":null,"abstract":"The present work describes a framework for morphological characterization of galaxies based on the Spectral Graph Wavelet Transform. A galaxy image is sampled with a number of points randomly chosen, whose Delaunay triangulation results in an arbitrary graph. The average intensity value in a 5 × 5 vicinity of a pixel related to a graph vertex is assigned to the corresponding graph vertex. A weight inversely proportional to the photometric distance between each pair of vertices is assigned to the respective graph edge. The Spectral Graph Wavelet Transform is computed from this weighted graph with real-valued vertices yielding a high-dimensional feature vector, which is reduced to a two dimensional vector through Principal Component Analysis. The proposed framework has been assessed through two case studies, namely, the case study of analyzing (i) 2D binary images from shapes and preliminary results of (ii) 2D gray tone images from galaxies. The obtained results imply the suitability of this framework for the characterization of galaxies images.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"34 1","pages":"307-316"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76526015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Jensen, Beth Plale, Xiaozhong Liu, Miao Chen, David B. Leake, Julie England
{"title":"Generalized representation and mapping for social-ecological data: Freeing data from the database","authors":"S. Jensen, Beth Plale, Xiaozhong Liu, Miao Chen, David B. Leake, Julie England","doi":"10.1109/eScience.2012.6404486","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404486","url":null,"abstract":"Scientific discovery increasingly requires collaboration between scientific sub-domains that often have different representations for their data. To bridge gaps between varying domain representations, researchers are developing metadata and semantic representations meaningful to broader communities. Through exploiting these representations we propose a logical model and architecture by which cross-domain researchers can more easily discover, use, and eventually archive, data. In this paper we present an architecture, intermediate data model, and methodology for mapping diverse social-ecological data sources stored in relational databases to a common representation, and for classifying textual data using machine learning. The results are visualized through client views that are built against the general logical model, and applied against a longitudinal database from social-ecological research.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"52 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74789463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Cohen Boulakia, C. Froidevaux, Jiuqiang Chen
{"title":"Scientific workflow rewriting while preserving provenance","authors":"Sarah Cohen Boulakia, C. Froidevaux, Jiuqiang Chen","doi":"10.1109/eScience.2012.6404419","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404419","url":null,"abstract":"Scientific workflow systems are numerous and equipped of provenance modules able to collect data produced and consumed during workflow runs to enhance reproducibility. An increasing number of approaches have been developed to help managing provenance information. Some of them are able to process data in a polynomial time but they require workflows to have series-parallel (SP) structures. Rewriting any workflow into an SP workflow is thus particularly important. In this paper, (i) we introduce the concept of provenance-equivalent rewriting process, (ii) we review existing graph transformations, (iii) we design the provenance-equivalent SPFlow algorithm, (iv) we evaluate our approach over a thousand of real workflows.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"75 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74977586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"g-Social: Enhancing integrated e-science tools with Social Networking functionality","authors":"Andriani Stylianou, N. Loulloudes, M. Dikaiakos","doi":"10.1109/eScience.2012.6404454","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404454","url":null,"abstract":"During the last decade, the scientific community has witnessed an unprecedented deployment of large-scale, federated e-Infrastructures such as Grid Computing, primarily for supporting data-intensive scientific exploration and coordinated problem solving. However, practical experience and user studies have indicated that the adoption of such e-Infrastructures is lagging behind original expectations, a fact which is mainly attributed to the limited support that available tools provide for user collaboration and information sharing. The goal of this paper is twofold, first to lay down the foundations for building a collaboration environment in the form of abstractions and second to show the effectiveness of these abstractions through g-Social, an Eclipse-based, open-source environment as an extension to g-Eclipse, that provides a powerful, user-friendly, platform-independent toolset for users, application developers and administrators of Grid infrastructures. g-Social enables user collaboration and resource sharing through Online Social Networking services, capitalizing on the success that these services have.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"3 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84887225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data-oriented research for bioresource utilization: A case study to investigate water uptake in cellulose using Principal Components","authors":"L. Ling, C. Driemeier, R. M. C. Junior","doi":"10.1109/eScience.2012.6404485","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404485","url":null,"abstract":"Bioresource utilization represents an important interdisciplinary research that integrates academic and industrial expertise across diverse scientific domains, including physics, chemistry, biology, and engineering. The present paper describes a cyber-infrastructure being created at the Brazilian Bioethanol Science and Technology Laboratory (CTBE) to assist scientists working on the field. One key element of the infrastructure is the LignoCel Platform, a tailor-made database for upload, curation, and sharing of lignocellulose data. Particularly, LignoCel allows querying the data and exporting subsets that are analyzed for knowledge extraction. In the present paper, a case-study is described, in which scientists want to investigate the dimensions that relate cellulose structure and water uptake. Data analysis and dimensionality reduction using Principal Component Analysis (PCA) is employed. Different PCA-based measurements are extracted and visualized through automatically-generated HTML pages available for the domain scientists. In this case study, the workflow successfully provided dimensionality reduction from a data matrix originated from a heterogeneous set of materials. PCA scores and loadings are explored for data analysis and visualization. PCA reduced the 11 measured features (obtained from three different experimental techniques, 55 possible combinations of size 2) into a two-dimensional PC1PC2 loadings plot representing 89% of data variance. Examples of the output produced by the system are available at http://data.bioetanol.org. br/~liu.ling/pca-lignocel/.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"2 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76565242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Yu, S. Kelling, Jeff Gerbracht, Weng-Keen Wong
{"title":"Automated data verification in a large-scale citizen science project: A case study","authors":"Jun Yu, S. Kelling, Jeff Gerbracht, Weng-Keen Wong","doi":"10.1109/eScience.2012.6404472","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404472","url":null,"abstract":"Although citizen science projects can engage a very large number of volunteers to collect volumes of data, they are susceptible to issues with data quality. Our experience with eBird, which is a broad-scale citizen science project to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. The increasing volume of data being collected by eBird places a huge burden on these volunteer experts and other automated approaches to improve data quality are needed. In this work, we describe a case study in which we evaluate an automated data quality filter that improves data quality by identifying outliers and categorizing these outliers as either unusual valid observations or mis-identified (invalid) observations. This automated data filter involves a two-step process: first, a data-driven method detects outliers (ie. observations that are unusual for a given region and date). Next, we use a data quality model based on an observer's predicted expertise to decide if an outlier should be flagged for review. We applied this automated data filter retrospectively to eBird data from Tompkins Co., NY and found that that this automated process significantly reduced the workload of reviewers by as much as 43% and identifies 52% more potentially invalid observations.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"93 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75969153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Kovalchuk, Pavel A. Smirnov, Sergey S. Kosukhin, A. Boukhanovsky
{"title":"Virtual Simulation Objects concept as a framework for system-level simulation","authors":"S. Kovalchuk, Pavel A. Smirnov, Sergey S. Kosukhin, A. Boukhanovsky","doi":"10.1109/eScience.2012.6404413","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404413","url":null,"abstract":"This paper presents Virtual Simulation Objects (VSO) concept which forms theoretical basis for building tools and framework that is developed for system-level simulations using existing software modules available within cyber-infrastructure. Presented concept is implemented by the software tool for building composite solutions using VSO-based GUI and running them using CLAVIRE simulation environment.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"785 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76228270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A satellite data portal developed for crowdsourcing data analysis and interpretation","authors":"Zhenghui Hu, Wenjun Wu","doi":"10.1109/eScience.2012.6404453","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404453","url":null,"abstract":"Satellite data products derived from the remote sensing observations describe features of the land, ocean and atmosphere. And by data processing, they can be used to study processes and trends on local/global scale for real-time environmental research and applications. However, the advances of cutting-edge remote sensing technology bring the challenge of data deluge for satellite data analysis and interpretation. With combinations of human intelligence and machine intelligence, we develop a satellite data portal for crowdsourcing data analysis and interpretation through teaching and learning to cope with the overwhelming data deluge. Compared with all the existing data portals and crowdsourcing systems, it is the first attempt to embed crowdsourcing into a data portal to provide integrated services of satellite data access and analysis.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"58 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90339614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Wolstencroft, S. Owen, C. Goble, Quyen Nguyen, Olga Krebs, Wolfgang Müller
{"title":"RightField: Semantic enrichment of Systems Biology data using spreadsheets","authors":"K. Wolstencroft, S. Owen, C. Goble, Quyen Nguyen, Olga Krebs, Wolfgang Müller","doi":"10.1109/ESCIENCE.2012.6404412","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404412","url":null,"abstract":"The interpretation and integration of experimental data depends on consistent metadata and uniform annotation. However, there are many barriers to the acquisition of this rich semantic metadata, not least the overhead and complexity of its collection by scientists. We present RightField, a lightweight spreadsheet-based annotation tool for lowering the barrier of manual metadata acquisition; and a data integration application for extracting and querying RDF data from these enriched spreadsheets. By hiding the complexities of semantic annotation, we can improve the collection of rich metadata, at source, by scientists. We illustrate the approach with results from the SysMO program, showing that RightField supports the whole workflow of semantic data collection, submission and RDF querying in Systems Biology. The RightField tool is freely available from http://www.rightfield.org.uk, and the code is open source under the BSD License.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"26 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90782986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}