Shane Halloran, J. Shi, Yu Guan, Xi Chen, Michael Dunne-Willows, J. Eyre
{"title":"Remote Cloud-Based Automated Stroke Rehabilitation Assessment Using Wearables","authors":"Shane Halloran, J. Shi, Yu Guan, Xi Chen, Michael Dunne-Willows, J. Eyre","doi":"10.1109/eScience.2018.00063","DOIUrl":"https://doi.org/10.1109/eScience.2018.00063","url":null,"abstract":"We outline a system enabling accurate remote assessment of stroke rehabilitation levels using wrist worn accelerometer time series data. The system is built based on features generated from clustering models across sliding windows in the data and makes use of computation in the cloud. Predictive models are built using advanced machine learning techniques.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"47 1","pages":"302-302"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83615575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bilgin, L. Hollink, J. V. Ossenbruggen, E. T. K. Sang, Kim Smeenk, Frank Harbers, M. Broersma
{"title":"Utilizing a Transparency-Driven Environment Toward Trusted Automatic Genre Classification: A Case Study in Journalism History","authors":"A. Bilgin, L. Hollink, J. V. Ossenbruggen, E. T. K. Sang, Kim Smeenk, Frank Harbers, M. Broersma","doi":"10.1109/eScience.2018.00137","DOIUrl":"https://doi.org/10.1109/eScience.2018.00137","url":null,"abstract":"With the growing abundance of unlabeled data in real-world tasks, researchers have to rely on the predictions given by black-boxed computational models. However, it is an often neglected fact that these models may be scoring high on accuracy for the wrong reasons. In this paper, we present a practical impact analysis of enabling model transparency by various presentation forms. For this purpose, we developed an environment that empowers non-computer scientists to become practicing data scientists in their own research field. We demonstrate the gradually increasing understanding of journalism historians through a real-world use case study on automatic genre classification of newspaper articles. This study is a first step towards trusted usage of machine learning pipelines in a responsible way.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"9 1","pages":"486-496"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83536962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power Asymmetries of eHumanities Infrastructures","authors":"Max Kemman","doi":"10.1109/eScience.2018.00103","DOIUrl":"https://doi.org/10.1109/eScience.2018.00103","url":null,"abstract":"Digital research infrastructures simultaneously enable and confine the research practices of scholars, constituting a power relation. This power relation can be characterised as a power asymmetry, with scholars dependent on the developers of infrastructures. In order to reduce this power asymmetry, infrastructures are developed in collaboration between scholars and computational researchers. Through an analysis of over twenty interviews, I will investigate the role of knowledge asymmetry, the ignorance of how a collaborator performs their tasks, and how this relates to power asymmetry in eScience collaborations in digital history. I will moreover consider how these asymmetries pose a challenge in the development and adoption of research infrastructures in the humanities.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"85 1","pages":"370-371"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83902118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bryce D. Mecum, Matthew B. Jones, D. Vieglais, C. Willis
{"title":"Preserving Reproducibility: Provenance and Executable Containers in DataONE Data Packages","authors":"Bryce D. Mecum, Matthew B. Jones, D. Vieglais, C. Willis","doi":"10.1109/eScience.2018.00019","DOIUrl":"https://doi.org/10.1109/eScience.2018.00019","url":null,"abstract":"Many data packaging standards are available to researchers and data repository operators and the choice to use an existing standard or create a new one is challenging. We introduce the DataONE Data Package standard which is based on the existing OAI-ORE Resource Map standard. We describe the functionality Data Package provides, implementation considerations, compare it to existing standards, and discuss future extensions to the standard including the ability to describe execution environments via WholeTale \"Tales\"\" and alternate serialization formats.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"16 1","pages":"45-49"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77125113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. I. Pelupessy, B. V. Werkhoven, G. Oord, S. Zwart, A. V. Elteren, H. Dijkstra
{"title":"Development of the OMUSE/AMUSE Modeling System","authors":"F. I. Pelupessy, B. V. Werkhoven, G. Oord, S. Zwart, A. V. Elteren, H. Dijkstra","doi":"10.1109/eScience.2018.00105","DOIUrl":"https://doi.org/10.1109/eScience.2018.00105","url":null,"abstract":"The Oceanographic Multipurpose Software Environment (OMUSE, [1]) is an open source framework developed for oceanographic and other earth system modelling applications. OMUSE provides a homogeneous environment to interface with numerical simulation codes. It was developed at the IMAU (Utrecht) using coupling technology developed for astrophysical applications in the AMUSE project at Leiden Observatory[2,3]. OMUSE simplifies the use and deployment of numerical simulations codes. Furthermore, the design of the OMUSE interfaces (figure 1) allow codes that represent different physics or span different ranges of physical scales to be easily combined in novel numerical experiments. The use cases for OMUSE range from running simple numerical experiments with single codes and the addition of data analysis tools in model runs, to setting up fairly complicated and strongly coupled solvers for problems that are intrinsically multi-scale and/or require different physics. Here, we will present the design of OMUSE as well as give examples of the types of the couplings that can be implemented using OMUSE. The example provided by AMUSE and OMUSE suggests that application of the same interfacing philosophy to a more extensive set of disciplines is possible. In order to facilitate this a better separation of the core framework and domain specific code is necessary. We will present ongoing work to support meteorological and hydrological applications and the use of the framework as the computational core in the eWatercycle project [4]. For this, adaptations are made to improve the interoperability with existing interface efforts (such as the BMI) and we discuss developments regarding the encapsulation of OMUSE/AMUSE and its component models in containers. This will facilitate the installation for first time users, removing a barrier in this respect. In addition to this we anticipate this to also offer more flexible deployment options for the framework.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"50 1","pages":"374-374"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87023537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann
{"title":"Navigating Sea-Ice Timeseries Data using Tracklines","authors":"Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann","doi":"10.1109/eScience.2018.00115","DOIUrl":"https://doi.org/10.1109/eScience.2018.00115","url":null,"abstract":"Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same \"tile\" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"14 1","pages":"392-392"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74725114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Etienne Brangbour, P. Bruneau, S. Marchand-Maillet
{"title":"Extracting Flood Maps from Social Media for Assimilation","authors":"Etienne Brangbour, P. Bruneau, S. Marchand-Maillet","doi":"10.1109/eScience.2018.00045","DOIUrl":"https://doi.org/10.1109/eScience.2018.00045","url":null,"abstract":"This abstract states the position of the Publimape project, and unveils progress achieved since its recent start.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"13 1","pages":"272-273"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74096373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big Provenance Stream Processing for Data Intensive Computations","authors":"Isuru Suriarachchi, S. Withana, Beth Plale","doi":"10.1109/eScience.2018.00039","DOIUrl":"https://doi.org/10.1109/eScience.2018.00039","url":null,"abstract":"In the business and research landscape of today, data analysis consumes public and proprietary data from numerous sources, and utilizes any one or more of popular data-parallel frameworks such as Hadoop, Spark and Flink. In the Data Lake setting these frameworks co-exist. Our earlier work has shown that data provenance in Data Lakes can aid with both traceability and management. The sheer volume of fine-grained provenance generated in a multi-framework application motivates the need for on-the-fly provenance processing. We introduce a new parallel stream processing algorithm that reduces fine-grained provenance while preserving backward and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. It is evaluated using several strategies for partitioning a provenance stream. The evaluation shows that the parallel algorithm performs well in processing out-of-order provenance streams, with good scalability and accuracy.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"35 1","pages":"245-255"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75853693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Shaffer, Kyle M. D. Sweeney, Nathaniel Kremer-Herman, D. Thain
{"title":"A First Look at the JX Workflow Language","authors":"Tim Shaffer, Kyle M. D. Sweeney, Nathaniel Kremer-Herman, D. Thain","doi":"10.1109/eScience.2018.00094","DOIUrl":"https://doi.org/10.1109/eScience.2018.00094","url":null,"abstract":"Scientific workflows are typically expressed as a graph of logical tasks, each one representing a single program along with its input and output files. This poster introduces JX (JSON eXtended), a declarative language that can express complex workloads as an assembly of sub-graphs that can be partitioned in flexible ways. We present a case study of using JX to represent complex workflows for the Lifemapper biodiversity project. We evaluate partitioning approaches across several computing environments, including ND-Condor, IU-Jetstream, and SDSC-Comet, and show that a coarse partitioning results in faster turnaround times, reduced data transfer, and lower master utilization across all three systems.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"15 1","pages":"352-353"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89434752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nasir U. Eisty, G. Thiruvathukal, Jeffrey C. Carver
{"title":"A Survey of Software Metric Use in Research Software Development","authors":"Nasir U. Eisty, G. Thiruvathukal, Jeffrey C. Carver","doi":"10.1109/eScience.2018.00036","DOIUrl":"https://doi.org/10.1109/eScience.2018.00036","url":null,"abstract":"Background: Breakthroughs in research increasingly depend on complex software libraries, tools, and applications aimed at supporting specific science, engineering, business, or humanities disciplines. The complexity and criticality of this software motivate the need for ensuring quality and reliability. Software metrics are a key tool for assessing, measuring, and understanding software quality and reliability. Aims: The goal of this work is to better understand how research software developers use traditional software engineering concepts, like metrics, to support and evaluate both the software and the software development process. One key aspect of this goal is to identify how the set of metrics relevant to research software corresponds to the metrics commonly used in traditional software engineering. Method: We surveyed research software developers to gather information about their knowledge and use of code metrics and software process metrics. We also analyzed the influence of demographics (project size, development role, and development stage) on these metrics. Results: The survey results, from 129 respondents, indicate that respondents have a general knowledge of metrics. However, their knowledge of specific SE metrics is lacking, their use even more limited. The most used metrics relate to performance and testing. Even though code complexity often poses a significant challenge to research software development, respondents did not indicate much use of code metrics. Conclusions: Research software developers appear to be interested and see some value in software metrics but may be encountering roadblocks when trying to use them. Further study is needed to determine the extent to which these metrics could provide value in continuous process improvement.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"52 1","pages":"212-222"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90063350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}