Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)最新文献
{"title":"Applying Lessons from e-Discovery to Process Big Data using HPC","authors":"Sukrit Sondhi, R. Arora","doi":"10.1145/2616498.2616525","DOIUrl":"https://doi.org/10.1145/2616498.2616525","url":null,"abstract":"The term 'Big Data' defines large datasets that are difficult to use and manage through conventional software tools. Legal Electronic Discovery (e-Discovery) is a business domain which has massive consumption of Big Data, where electronic records such as e-mail, documents, databases and social media postings are processed in order to discover evidence that may be pertinent to legal/compliance needs, litigation or other investigations. Numerous vendors exist in the market to provide organizations with services such as data collection, digital forensics and electronic discovery. High-end instrumentation and modern information technologies are creating data at an ever increasing rate. The challenges associated with managing the large datasets are related to the capture, storage, search, sharing, analytics, and visualization of the data. Big Data also offers unprecedented opportunities in other fields, ranging from astronomy and biology to marketing and e-commerce. This paper presents lessons learnt from the legal e-Discovery domain that can be adapted to process Big Data effectively on HPC resources, thereby benefitting the various disciplines of science, engineering and business that are grappling with a deluge of Big Data challenges and opportunities.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"52 1","pages":"8:1-8:2"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87474752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ECSS Experience: Particle Tracing Reinvented","authors":"C. Rosales, R. McLay","doi":"10.1145/2616498.2616527","DOIUrl":"https://doi.org/10.1145/2616498.2616527","url":null,"abstract":"This work describes an implementation of distributed particle tracking that provides a factor 10000x speedup over traditional schemes. While none of the techniques used to achieve this result are completely new, they have been used in combination to great effect in this project. The implementation includes parallel IO using HDF5, a flexible load balancing scheme, and dynamic buffering to achieve excellent performance at scale. The use of HDF5 decouples the size of the simulation generating the data from the particle tracing, providing a more flexible and efficient workflow. The load balancing scheme ensures that heterogeneous particle distributions do not result in a waste of computational resources by maintaining all the MPI tasks occupied at any given time. Dynamic buffering minimizes MPI exchanges across MPI tasks, a critical element in the performance improvements achieved.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"22 1","pages":"13:1-13:2"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73688116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Calculation of Sensitivity Coefficients for Individual Airport Emissions in the Continental U.S. using CMAQ-DDM/PM","authors":"S. Boone, S. Arunachalam","doi":"10.1145/2616498.2616504","DOIUrl":"https://doi.org/10.1145/2616498.2616504","url":null,"abstract":"Fine particulate matter (PM2.5) is a federally-regulated air pollutant with well-known impacts on human health. The FAA's Destination 2025 program seeks to decrease aviation-related health impacts across the U.S. by 50% by the year 2018. Atmospheric models, such as the Community Multiscale Air Quality model (CMAQ), are used to estimate the atmospheric concentration of pollutants such as PM2.5. Sensitivity analysis of these models has long been limited to finite difference and regression-based methods, both of which require many computationally intensive model simulations to link changes in output with perturbations in input. Further, they are unable to offer detailed or ad hoc analysis for changes within a domain, such as changes in emissions on an airport-by-airport basis. In order to calculate the sensitivity of PM2.5 concentrations to emissions from individual airports, we utilize the Decoupled Direct Method in three dimensions (DDM-3D), an advanced sensitivity analysis tool recently implemented in CMAQ. DDM-3D allows calculation of sensitivity coefficients within a single simulation, eliminating the need for multiple model runs. However, while the output provides results for a variety of input perturbations in a single simulation, the processing time for each run is dramatically increased compared to simulations conducted without the DDM-3D module.\u0000 Use of the XSEDE Stampede computing cluster allows us to calculate sensitivity coefficients for a large number of input parameters. This allows for a much wider variety of ad hoc aviation policy scenarios to be generated and evaluated than would be possible using other sensitivity analysis methods or smaller-scaled computing systems. We present a design of experiments to compute individual sensitivity coefficients for 139 major airports in the US, due to six different precursor emissions that form PM2.5 in the atmosphere. Simulations based on this design are currently in progress, with full results to be published at a later date.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"54 1","pages":"10:1-10:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74824788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Launcher: A Shell-based Framework for Rapid Development of Parallel Parametric Studies","authors":"Lucas A. Wilson, John M. Fonner","doi":"10.1145/2616498.2616534","DOIUrl":"https://doi.org/10.1145/2616498.2616534","url":null,"abstract":"Petascale computing systems have enabled tremendous advances for traditional simulation and modeling algorithms that are built around parallel execution. Unfortunately, scientific domains using data-oriented or high-throughput paradigms have difficulty taking full advantage of these resources without custom software development. This paper describes our solution for rapidly developing parallel parametric studies using sequential or threaded tasks: The launcher. We detail how to get ensembles executing quickly through common job schedulers SGE and SLURM, and the various user-customizable options that the launcher provides. We illustrate the efficiency of or tool by presenting execution results at large scale (over 65,000 cores) for varying workloads, including a virtual screening workload with indeterminate runtimes using the drug docking software Autodock Vina.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"146 1","pages":"40:1-40:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86404758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Descriptive Data Analysis of File Transfer Data","authors":"S. Srinivasan, Victor Hazlewood, G. D. Peterson","doi":"10.1145/2616498.2616550","DOIUrl":"https://doi.org/10.1145/2616498.2616550","url":null,"abstract":"There are millions of files and multi-terabytes of data transferred to and from the University of Tennessee's National Institute for Computational Sciences each month. New capabilities available with GridFTP version 5.2.2 include additional transfer log information previously unavailable in prior versions implemented within XSEDE. The transfer log data now available includes identification of source and destination endpoints which unlocks a wealth of information that can be used to detail GridFTP activities across the Internet. This information can be used for a wide variety of reports of interest to individual XSEDE Service Providers and to XSEDE Operations. In this paper, we discuss the new capabilities available for transfer logs in GridFTP 5.2.2, our initial attempt to organize, analyze, and report on this file transfer data for NICS, and its applicability to XSEDE Service Providers. Analysis of this new information can provide insight into effective and efficient utilization of GridFTP resources including identification of potential areas of GridFTP file transfer improvement (e.g., network and server tuning) and potential predictive analysis to improve efficiency.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"112 1","pages":"37:1-37:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85777550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PGDB: A Debugger for MPI Applications","authors":"Nikoli Dryden","doi":"10.1145/2616498.2616535","DOIUrl":"https://doi.org/10.1145/2616498.2616535","url":null,"abstract":"As MPI applications scale to larger machines, errors that had been hidden from testing at smaller scales begin to manifest themselves. It is therefore necessary to extend debuggers to work at these scales, in order for efficient development of correct applications to proceed. PGDB is the Parallel GDB, an open-source debugger for MPI applications that provides such a capability. It is designed from the ground up to be a robust debugging environment at scale, while presenting an interface similar to that of the typical command-line GDB debugger. Its usage on representative debugging problems is demonstrated and its scalability on the Stampede supercomputer is evaluated.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"75 1","pages":"44:1-44:7"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77155305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Uwe Hilgert, S. McKay, M. Khalfan, Jason J. Williams, Cornel Ghiban, D. Micklos
{"title":"DNA Subway: Making Genome Analysis Egalitarian","authors":"Uwe Hilgert, S. McKay, M. Khalfan, Jason J. Williams, Cornel Ghiban, D. Micklos","doi":"10.1145/2616498.2616575","DOIUrl":"https://doi.org/10.1145/2616498.2616575","url":null,"abstract":"DNA Subway bundles research-grade bioinformatics tools, high-performance computing, and databases into easy-to-use workflows. Students have been \"riding\" different lines since 2010, to predict and annotate genes in up to 150kb of raw DNA sequence (Red Line), identify homologs in sequenced genomes (Yellow Line), identify species using DNA barcodes and construct phylogenetic trees (Blue Line), and examine RNA sequence (RNA-Seq) datasets for transcript abundance and differential expression (Green Line). With support for plant and animal genomes, DNA Subway engages students in their own learning, bringing to life key concepts in molecular biology, genetics, and evolution. Integrated DNA barcoding and RNA extraction wet-lab experiments support a variety of inquiry-based projects using student-generated data. Products of student research can be exported, published, and used in follow-up experiments. To date, DNA Subway has over 8,000 registered users who have produced 51,000 projects.\u0000 Based on the popular Tuxedo Protocol, the Green Line was introduced in January 2014 as an easy-to-use workflow to analyze RNA-Seq datasets. The workflow uses iPlant's APIs (http://agaveapi.co/) to access high-performance compute resources of NSF's Extreme Scientific and Engineering Discovery Environment (XSEDE), providing the first easy \"on ramp\" to biological supercomputing.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"27 1","pages":"70:1-70:3"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82707674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating Job Predictions into the SEAGrid Science Gateway","authors":"Ye Fan, Sudhakar Pamidighantam, Warren Smith","doi":"10.1145/2616498.2616563","DOIUrl":"https://doi.org/10.1145/2616498.2616563","url":null,"abstract":"This paper describes the process of incorporating predictions of job queue wait times and run times into a Science Gateway. Science Gateways that integrate multiple resources can use predictions of queue wait times and run times to advice users when they choose where a job is executed or in an automated resource selection process. These predictions are also critical in executing workflows were it isn't feasible to have users specify where each task executes and the workflow management system therefore has to perform resource selection programmatically. SEAGrid science gateway has partly integrated the estimation of wait time prediction based on Karnak prediction service and is in the process of extending this to run time prediction.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"31 1","pages":"57:1-57:3"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86019719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Integrated Analytic Pipeline for Identifying and Predicting Genetic Interactions based on Perturbation Data from High Content Double RNAi Screening","authors":"Zheng Yin, Fuhai Li, Stephen T. C. Wong","doi":"10.1145/2616498.2616513","DOIUrl":"https://doi.org/10.1145/2616498.2616513","url":null,"abstract":"In this paper, we describe an integrated data analysis pipeline for identifying and predicting genetic interactions based on cellular responses to perturbations of single- and multiple-agents. This pipeline was developed in the context of genome wide single-RNAi screens and smaller scale double-RNAi screens using Drosophila KC-167 cell lines, with the aim to reconstruct the molecular pathways regulating changes in cell shape. The TACC (Texas Advanced Computing Center) under XSEDE framework allocated 100,000 service unites (SUs) from its Stampede system to facilitate image quantification and signaling pathway modeling using fluorescence images of Drosophila cells, and recently a kinome-wide single RNAi screening has been reported [1].","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"05 1","pages":"7:1-7:2"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85910456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical Performance Analysis for Scientific Applications","authors":"Fei Xing, Haihang You, Charng-Da Lu","doi":"10.1145/2616498.2616555","DOIUrl":"https://doi.org/10.1145/2616498.2616555","url":null,"abstract":"As high-performance computing (HPC) heads towards the exascale era, application performance analysis becomes more complex and less tractable. It usually requires considerable training, experience, and a good working knowledge of hardware/software interaction to use performance tools effectively, which becomes a barrier for domain scientists. Moreover, instrumentation and profiling activities from a large run can easily generate gigantic data volume, making both data management and characterization another challenge. To cope with these, we develop a statistical method to extract the principal performance features and produce easily interpretable results. This paper introduces a performance analysis methodology based on the combination of Variable Clustering (VarCluster) and Principal Component Analysis (PCA), describes the analysis process, and gives experimental results of scientific applications on a Cray XT5 system. As a visualization aid, we use Voronoi tessellations to map the numerical results into graphical forms to convey the performance information more clearly.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"2 1","pages":"62:1-62:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89355620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}