{"title":"Safe Double Blind Studies as a Service","authors":"Tyler J. Skluzacek, Suhail Rehman, Ian T Foster","doi":"10.1109/eScience.2017.82","DOIUrl":"https://doi.org/10.1109/eScience.2017.82","url":null,"abstract":"The emergence of IoT devices is revolutionizing various aspects of human life, including healthcare, where the use of such devices can potentially improve health outcomes for millions. However, the efficacy of treatments and protocols based on IoT devices is measured through the use of rigorous double-blind studies, which can be quite expensive to conduct as they traditionally require a third party mediator. In this paper, we propose CATnIP, a secure, centralized cloud hub for instrumenting and conducting double-blind studies, with an extended focus on seamless integration with IoT devices. This paper outlines the construction and security considerations of CATnIP, the motivations behind creating such a system, and an evaluation based on the Five Safes and Stakeholder frameworks.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121758560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Huerta, R. Haas, E. Hernandez, D. Katz, S. Anderson, P. Couvares, J. Willis, Timothy Bouvet, J. Enos, William T. C. Kramer, H. Leong, D. Wheeler
{"title":"BOSS-LDG: A Novel Computational Framework That Brings Together Blue Waters, Open Science Grid, Shifter and the LIGO Data Grid to Accelerate Gravitational Wave Discovery","authors":"E. Huerta, R. Haas, E. Hernandez, D. Katz, S. Anderson, P. Couvares, J. Willis, Timothy Bouvet, J. Enos, William T. C. Kramer, H. Leong, D. Wheeler","doi":"10.1109/eScience.2017.47","DOIUrl":"https://doi.org/10.1109/eScience.2017.47","url":null,"abstract":"We present a novel computational framework that connects Blue Waters, the NSF-supported, leadership-class supercomputer operated by NCSA, to the Laser Interferometer Gravitational-Wave Observatory (LIGO) Data Grid via Open Science Grid technology. To enable this computational infrastructure, we configured, for the first time, a LIGO Data Grid Tier-1 Center that can submit heterogeneous LIGO workflows using Open Science Grid facilities. In order to enable a seamless connection between the LIGO Data Grid and Blue Waters via Open Science Grid, we utilize Shifter to containerize LIGO’s workflow software. This work represents the first time Open Science Grid, Shifter, and Blue Waters are unified to tackle a scientific problem and, in particular, it is the first time a framework of this nature is used in the context of large scale gravitational wave data analysis. This new framework has been used in the last several weeks of LIGO’s second discovery campaign to run the most computationally demanding gravitational wave search workflows on Blue Waters, and accelerate discovery in the emergent field of gravitational wave astrophysics. We discuss the implications of this novel framework for a wider ecosystem of Higher Performance Computing users.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128980680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative Reuse of Streaming Dataflows in IoT Applications","authors":"S. Chaturvedi, S. Tyagi, Yogesh L. Simmhan","doi":"10.1109/eScience.2017.54","DOIUrl":"https://doi.org/10.1109/eScience.2017.54","url":null,"abstract":"Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming enable composition of continuous dataflows that execute persistently over data streams. They are used by Internet of Things (IoT) applications to analyze sensor data from Smart City cyber-infrastructure, and make active utility management decisions. As the ecosystem of such IoT applications that leverage shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping dataflows, thereby improving the resource efficiency. In this paper, we propose dataflow reuse algorithms that given a submitted dataflow, identifies the intersection of reusable tasks and streams from a collection of running dataflows to form a merged dataflow. Similar algorithms to unmerge dataflows when they are removed are also proposed. We implement these algorithms for the popular Apache Storm DSPS, and validate their performance and resource savings for 35 synthetic dataflows based on public OPMW workflows with diverse arrival and departure distributions, and on 21 real IoT dataflows from RIoTBench. We see that our Reuse algorithms reduce the count of running tasks by 38 – 46% for the two workloads, and a reduction in cumulative CPU usage of 36–51%, that can result in real cost savings on Cloud resources.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":" 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113950180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuong V. Nguyen, Matt Adcock, S. Anderson, David R. Lovell, Nicole Fisher, J. Salle
{"title":"Towards High-Throughput 3D Insect Capture for Species Discovery and Diagnostics","authors":"Chuong V. Nguyen, Matt Adcock, S. Anderson, David R. Lovell, Nicole Fisher, J. Salle","doi":"10.1109/eScience.2017.90","DOIUrl":"https://doi.org/10.1109/eScience.2017.90","url":null,"abstract":"Digitisation of natural history collections not only preserves precious information about biological diversity, it also enables us to share, analyse, annotate and compare specimens to gain new insights. High-resolution, full-colour 3D capture of biological specimens yields color and geometry information complementary to other techniques (e.g., 2D capture, electron scanning and micro computed tomography). However 3D colour capture of small specimens is slow for reasons including specimen handling, the narrow depth of field of high magnification optics, and the large number of images required to resolve complex shapes of specimens. In this paper, we outline techniques to accelerate 3D image capture, including using a desktop robotic arm to automate the insect handling process; using a calibrated pan-tilt rig to avoid attaching calibration targets to specimens; using light field cameras to capture images at an extended depth of field in one shot; and using 3D Web and mixed reality tools to facilitate the annotation, distribution and visualisation of 3D digital models.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123144081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dai Hai Ton That, Gabriel Fils, Zhihao Yuan, T. Malik
{"title":"Sciunits: Reusable Research Objects","authors":"Dai Hai Ton That, Gabriel Fils, Zhihao Yuan, T. Malik","doi":"10.1109/eScience.2017.51","DOIUrl":"https://doi.org/10.1109/eScience.2017.51","url":null,"abstract":"Science is conducted collaboratively, often requiring knowledge sharing about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. In this paper, we present the sciunit, a reusable research object in which aggregated content is recomputable. We describe a Git-like client that efficiently creates, stores, and repeats sciunits. We show through analysis that sciunits repeat computational experiments with minimal storage and processing overhead. Finally, we provide an overview of sharing and reproducible cyberinfrastructure based on sciunits gaining adoption in the domain of geosciences.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"250 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115788681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding Software in Research: Initial Results from Examining Nature and a Call for Collaboration","authors":"Udit Nangia, D. Katz","doi":"10.1109/eScience.2017.78","DOIUrl":"https://doi.org/10.1109/eScience.2017.78","url":null,"abstract":"This lightning talk paper discusses an initial data set that has been gathered to understand the use of software in research, and is intended to spark wider interest in gathering more data. The initial data analyzes three months of articles in the journal Nature for software mentions. The wider activity that we seek is a community effort to analyze a wider set of articles, including both a longer timespan of Nature articles as well as articles in other journals. Such a collection of data could be used to understand how the role of software has changed over time and how it varies across fields.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132147613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Label Classification of Frog Species via Deep Learning","authors":"Jie Xie, Rui Xu, Jinglan Zhang, P. Roe","doi":"10.1109/eScience.2017.31","DOIUrl":"https://doi.org/10.1109/eScience.2017.31","url":null,"abstract":"Acoustic classification of frogs has received increasing attention for its promising application in ecological studies. Various studies have been proposed for classifying frog species, but most recordings are assumed to have only a single species. In this study, a method to classify multiple frog species in an audio clip is presented. To be specific, continuous frog recordings are first cropped into audio clips (10 seconds). Then, various time-frequency representations are generated for each 10-s recording. Next, instead of using traditional hand-crafted features, various features are extracted using pre-trained networks using three time-frequency representations: Fast-Fourier spectrogram, Constant-Q transform spectrogram, and Gammatone-like spectrogram. Finally, a binary relevance based multi-label classification approach is proposed to classify simultaneously vocalizing frog species with our proposed features. Our proposed method is verified using eight frog species widely distributed in Queensland, Australia. The results show that the proposed features extracted via pre-trained networks can achieve better classification performance when compared to hand-crafted features for classifying multiple simultaneously vocalizing species.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116478931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Oleynik, S. Panitkin, M. Turilli, Alessio Angius, S. Oral, K. De, A. Klimentov, J. Wells, S. Jha
{"title":"High-Throughput Computing on High-Performance Platforms: A Case Study","authors":"D. Oleynik, S. Panitkin, M. Turilli, Alessio Angius, S. Oral, K. De, A. Klimentov, J. Wells, S. Jha","doi":"10.1109/eScience.2017.43","DOIUrl":"https://doi.org/10.1109/eScience.2017.43","url":null,"abstract":"The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size re-source. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan - a DOE leadership facility in conjunction with traditional distributed high-throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130456211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Turilli, Y. Babuji, André Merzky, M. Ha, M. Wilde, D. Katz, S. Jha
{"title":"Evaluating Distributed Execution of Workloads","authors":"M. Turilli, Y. Babuji, André Merzky, M. Ha, M. Wilde, D. Katz, S. Jha","doi":"10.1109/eScience.2017.41","DOIUrl":"https://doi.org/10.1109/eScience.2017.41","url":null,"abstract":"Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation difficulties by experimentally characterizing diverse modalities of resource selection and task placement. We compare the architectures and capabilities of two systems: the AIMES middleware and Swift workflow scripting language and runtime. We integrate these systems to enable the distributed execution of Swift workflows on Pilot-Jobs managed by the AIMES middleware. Our experiments characterize and compare alternative execution strategies by measuring the time to completion of heterogeneous uncoupled workloads executed at diverse scale and on multiple resources. We measure the adverse effects of pilot fragmentation and early binding of tasks to resources and the benefits of backfill scheduling across pilots on multiple resources. We then use this insight to execute a multi-stage workflow across five production-grade resources. We discuss the importance and implications for other tools and workflow systems","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115122942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Metropolitan Area Infrastructure for Data Intensive Science","authors":"D. Abramson, J. Carroll, Chao Jin, M. Mallon","doi":"10.1109/eScience.2017.37","DOIUrl":"https://doi.org/10.1109/eScience.2017.37","url":null,"abstract":"The increasing amount of data being collected from simulations, instruments and sensors creates challenges for existing e-Science infrastructure. In particular, it requires new ways of storing, distributing and processing data in order to cope with both the volume and velocity of the data. The University of Queensland has recently designed and deployed MeDiCI, a data fabric that spans the metropolitan area and provides seamless access to data regardless of where it is created, manipulated and archived. MeDiCI is novel in that it exploits temporal and spatial locality to move data on demand in an automated manner. This means that data only needs to reside locally in high speed storage whilst being manipulated, and it can be archived transparently in high capacity, but slower, technologies at other times. MeDiCI is built on commercially available technologies. In this paper, we describe these innovations and present some early results.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127205631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}