Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)最新文献_第5页

Applying Lessons from e-Discovery to Process Big Data using HPC 将电子发现的经验教训应用于HPC处理大数据

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616525

Sukrit Sondhi, R. Arora

{"title":"Applying Lessons from e-Discovery to Process Big Data using HPC","authors":"Sukrit Sondhi, R. Arora","doi":"10.1145/2616498.2616525","DOIUrl":"https://doi.org/10.1145/2616498.2616525","url":null,"abstract":"The term 'Big Data' defines large datasets that are difficult to use and manage through conventional software tools. Legal Electronic Discovery (e-Discovery) is a business domain which has massive consumption of Big Data, where electronic records such as e-mail, documents, databases and social media postings are processed in order to discover evidence that may be pertinent to legal/compliance needs, litigation or other investigations. Numerous vendors exist in the market to provide organizations with services such as data collection, digital forensics and electronic discovery. High-end instrumentation and modern information technologies are creating data at an ever increasing rate. The challenges associated with managing the large datasets are related to the capture, storage, search, sharing, analytics, and visualization of the data. Big Data also offers unprecedented opportunities in other fields, ranging from astronomy and biology to marketing and e-commerce. This paper presents lessons learnt from the legal e-Discovery domain that can be adapted to process Big Data effectively on HPC resources, thereby benefitting the various disciplines of science, engineering and business that are grappling with a deluge of Big Data challenges and opportunities.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"52 1","pages":"8:1-8:2"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87474752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

ECSS Experience: Particle Tracing Reinvented ECSS经验:粒子追踪的重新发明

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616527

C. Rosales, R. McLay

引用次数: 3

Calculation of Sensitivity Coefficients for Individual Airport Emissions in the Continental U.S. using CMAQ-DDM/PM 使用CMAQ-DDM/PM计算美国大陆个别机场排放的敏感系数

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616504

S. Boone, S. Arunachalam

{"title":"Calculation of Sensitivity Coefficients for Individual Airport Emissions in the Continental U.S. using CMAQ-DDM/PM","authors":"S. Boone, S. Arunachalam","doi":"10.1145/2616498.2616504","DOIUrl":"https://doi.org/10.1145/2616498.2616504","url":null,"abstract":"Fine particulate matter (PM2.5) is a federally-regulated air pollutant with well-known impacts on human health. The FAA's Destination 2025 program seeks to decrease aviation-related health impacts across the U.S. by 50% by the year 2018. Atmospheric models, such as the Community Multiscale Air Quality model (CMAQ), are used to estimate the atmospheric concentration of pollutants such as PM2.5. Sensitivity analysis of these models has long been limited to finite difference and regression-based methods, both of which require many computationally intensive model simulations to link changes in output with perturbations in input. Further, they are unable to offer detailed or ad hoc analysis for changes within a domain, such as changes in emissions on an airport-by-airport basis. In order to calculate the sensitivity of PM2.5 concentrations to emissions from individual airports, we utilize the Decoupled Direct Method in three dimensions (DDM-3D), an advanced sensitivity analysis tool recently implemented in CMAQ. DDM-3D allows calculation of sensitivity coefficients within a single simulation, eliminating the need for multiple model runs. However, while the output provides results for a variety of input perturbations in a single simulation, the processing time for each run is dramatically increased compared to simulations conducted without the DDM-3D module.\u0000 Use of the XSEDE Stampede computing cluster allows us to calculate sensitivity coefficients for a large number of input parameters. This allows for a much wider variety of ad hoc aviation policy scenarios to be generated and evaluated than would be possible using other sensitivity analysis methods or smaller-scaled computing systems. We present a design of experiments to compute individual sensitivity coefficients for 139 major airports in the US, due to six different precursor emissions that form PM2.5 in the atmosphere. Simulations based on this design are currently in progress, with full results to be published at a later date.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"54 1","pages":"10:1-10:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74824788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Launcher: A Shell-based Framework for Rapid Development of Parallel Parametric Studies 启动器:一个基于shell的框架，用于并行参数化研究的快速发展

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616534

Lucas A. Wilson, John M. Fonner

引用次数: 16

Descriptive Data Analysis of File Transfer Data 文件传输数据的描述性数据分析

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616550

S. Srinivasan, Victor Hazlewood, G. D. Peterson

{"title":"Descriptive Data Analysis of File Transfer Data","authors":"S. Srinivasan, Victor Hazlewood, G. D. Peterson","doi":"10.1145/2616498.2616550","DOIUrl":"https://doi.org/10.1145/2616498.2616550","url":null,"abstract":"There are millions of files and multi-terabytes of data transferred to and from the University of Tennessee's National Institute for Computational Sciences each month. New capabilities available with GridFTP version 5.2.2 include additional transfer log information previously unavailable in prior versions implemented within XSEDE. The transfer log data now available includes identification of source and destination endpoints which unlocks a wealth of information that can be used to detail GridFTP activities across the Internet. This information can be used for a wide variety of reports of interest to individual XSEDE Service Providers and to XSEDE Operations. In this paper, we discuss the new capabilities available for transfer logs in GridFTP 5.2.2, our initial attempt to organize, analyze, and report on this file transfer data for NICS, and its applicability to XSEDE Service Providers. Analysis of this new information can provide insight into effective and efficient utilization of GridFTP resources including identification of potential areas of GridFTP file transfer improvement (e.g., network and server tuning) and potential predictive analysis to improve efficiency.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"112 1","pages":"37:1-37:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85777550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

PGDB: A Debugger for MPI Applications MPI应用程序的调试器

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616535

Nikoli Dryden

引用次数: 7

DNA Subway: Making Genome Analysis Egalitarian DNA地铁:使基因组分析平等

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616575

Uwe Hilgert, S. McKay, M. Khalfan, Jason J. Williams, Cornel Ghiban, D. Micklos

{"title":"DNA Subway: Making Genome Analysis Egalitarian","authors":"Uwe Hilgert, S. McKay, M. Khalfan, Jason J. Williams, Cornel Ghiban, D. Micklos","doi":"10.1145/2616498.2616575","DOIUrl":"https://doi.org/10.1145/2616498.2616575","url":null,"abstract":"DNA Subway bundles research-grade bioinformatics tools, high-performance computing, and databases into easy-to-use workflows. Students have been \"riding\" different lines since 2010, to predict and annotate genes in up to 150kb of raw DNA sequence (Red Line), identify homologs in sequenced genomes (Yellow Line), identify species using DNA barcodes and construct phylogenetic trees (Blue Line), and examine RNA sequence (RNA-Seq) datasets for transcript abundance and differential expression (Green Line). With support for plant and animal genomes, DNA Subway engages students in their own learning, bringing to life key concepts in molecular biology, genetics, and evolution. Integrated DNA barcoding and RNA extraction wet-lab experiments support a variety of inquiry-based projects using student-generated data. Products of student research can be exported, published, and used in follow-up experiments. To date, DNA Subway has over 8,000 registered users who have produced 51,000 projects.\u0000 Based on the popular Tuxedo Protocol, the Green Line was introduced in January 2014 as an easy-to-use workflow to analyze RNA-Seq datasets. The workflow uses iPlant's APIs (http://agaveapi.co/) to access high-performance compute resources of NSF's Extreme Scientific and Engineering Discovery Environment (XSEDE), providing the first easy \"on ramp\" to biological supercomputing.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"27 1","pages":"70:1-70:3"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82707674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Incorporating Job Predictions into the SEAGrid Science Gateway 将工作预测纳入SEAGrid科学门户

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616563

Ye Fan, Sudhakar Pamidighantam, Warren Smith

引用次数: 8

An Integrated Analytic Pipeline for Identifying and Predicting Genetic Interactions based on Perturbation Data from High Content Double RNAi Screening 基于高含量双RNAi筛选微扰数据的遗传相互作用识别和预测集成分析管道

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616513

Zheng Yin, Fuhai Li, Stephen T. C. Wong

引用次数: 0

Statistical Performance Analysis for Scientific Applications 科学应用的统计性能分析

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI: 10.1145/2616498.2616555

Fei Xing, Haihang You, Charng-Da Lu

{"title":"Statistical Performance Analysis for Scientific Applications","authors":"Fei Xing, Haihang You, Charng-Da Lu","doi":"10.1145/2616498.2616555","DOIUrl":"https://doi.org/10.1145/2616498.2616555","url":null,"abstract":"As high-performance computing (HPC) heads towards the exascale era, application performance analysis becomes more complex and less tractable. It usually requires considerable training, experience, and a good working knowledge of hardware/software interaction to use performance tools effectively, which becomes a barrier for domain scientists. Moreover, instrumentation and profiling activities from a large run can easily generate gigantic data volume, making both data management and characterization another challenge. To cope with these, we develop a statistical method to extract the principal performance features and produce easily interpretable results. This paper introduces a performance analysis methodology based on the combination of Variable Clustering (VarCluster) and Principal Component Analysis (PCA), describes the analysis process, and gives experimental results of scientific applications on a Cray XT5 system. As a visualization aid, we use Voronoi tessellations to map the numerical results into graphical forms to convey the performance information more clearly.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"2 1","pages":"62:1-62:8"},"PeriodicalIF":0.0,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89355620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1