Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing最新文献

Session details: Session II 会话详情:会话II

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI: 10.1145/3260992

E. Yildirim

引用次数: 0

SIDI: A Scalable in-Memory Density-based Index for Spatial Databases 基于内存密度的可扩展空间数据库索引

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI: 10.1145/2912152.2912158

D. Nguyen, K. Doan, T. Pham

引用次数: 4

Efficient and Scalable Workflows for Genomic Analyses 高效和可扩展的基因组分析工作流程

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI: 10.1145/2912152.2912156

Subho Sankar Banerjee, A. Athreya, L. S. Mainzer, C. Jongeneel, Wen-mei W. Hwu, Z. Kalbarczyk, R. Iyer

引用次数: 12

Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows 面向数据密集型现场科学工作流的持久数据暂存服务

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI: 10.1145/2912152.2912157

Melissa Romanus, Fan Zhang, Tong Jin, Qian Sun, H. Bui, M. Parashar, J. Choi, Saloman Janhunen, R. Hager, S. Klasky, Choong-Seock Chang, I. Rodero

{"title":"Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows","authors":"Melissa Romanus, Fan Zhang, Tong Jin, Qian Sun, H. Bui, M. Parashar, J. Choi, Saloman Janhunen, R. Hager, S. Klasky, Choong-Seock Chang, I. Rodero","doi":"10.1145/2912152.2912157","DOIUrl":"https://doi.org/10.1145/2912152.2912157","url":null,"abstract":"Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex natural and engineered phenomena. However, the increasing complexity necessitates managing, transporting, and processing unprecedented amounts of data, and as a result, researchers are increasingly exploring data-staging and in-situ workflows to reduce data movement and data-related overheads. However, as these workflows become more dynamic in their structures and behaviors, data staging and in-situ solutions must evolve to support new requirements. In this paper, we explore how the service-oriented concept can be applied to extreme-scale in-situ workflows. Specifically, we explore persistent data staging as a service and present the design and implementation of DataSpaces as a Service, a service-oriented data staging framework. We use a dynamically coupled fusion simulation workflow to illustrate the capabilities of this framework and evaluate its performance and scalability.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128582518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Minimising the Execution of Unknown Bag-of-Task Jobs with Deadlines on the Cloud 最大限度地减少在云中执行具有最后期限的未知任务袋作业

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI: 10.1145/2912152.2912153

Long Thai, B. Varghese, A. Barker

引用次数: 4

Towards Convergence of Extreme Computing and Big Data Centers 走向极限计算与大数据中心的融合

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI: 10.1145/2912152.2912159

S. Matsuoka

{"title":"Towards Convergence of Extreme Computing and Big Data Centers","authors":"S. Matsuoka","doi":"10.1145/2912152.2912159","DOIUrl":"https://doi.org/10.1145/2912152.2912159","url":null,"abstract":"Rapid growth in the use cases and demands for extreme computing and huge data processing is leading to convergence of the two infrastructures. Tokyo Tech.'s TSUBAME3.0, a 2017 addition to the highly successful TSUBAME2.5, will aim to deploy a series of innovative technologies, including ultra-efficient liquid cooling and power control, petabytes of non-volatile memory, as well as low cost Petabit-class interconnect. To address the challenges of such technology adoption, proper system architecture, software stack, and algorithm must be desgined and developed; these are being addressed by several of our ongoing research projects as well as prototypes, such as the TSUBAME-KFC/DL prototype which became #1 in the world in power efficiency on the Green500 twice in a row, the Billion-way Resiliency project that is investigating effective methods for future resilient supercomputers, as well as the Extreme Big Data (EBD) project which is looking at co-design development of convergent system stack given future extreme data and computing workloads. We are already successful in developing various algorithms and sottware substrates to manipulate big data elements directly on extreme supercomputers, such as graphs, tables (sort), trees, files, etc. and in fact became #1 in the world on the Graph 500 twice including the latest Nov. 2015 version. Our recent focus is also how to ssupport new workloads in categorizing big data represented by deep learning, and there we are collaborating with several partners such as DENSO to improve the scalability and predictability of such workloads; recent trial allowed scalablity to utilize 1146 GPUs for the entire week for a CNN workload. For TSUBAME3 and 2.5 combined we espect to increase such capabilities to over 80 Petaflops in early 2017, or 7 times faster than the K computer.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121233765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Experiences with Performing MapReduce Analysis of Scientific Data on HPC Platforms 在HPC平台上执行MapReduce科学数据分析的经验

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI: 10.1145/2912152.2912154

Diana Moise

{"title":"Experiences with Performing MapReduce Analysis of Scientific Data on HPC Platforms","authors":"Diana Moise","doi":"10.1145/2912152.2912154","DOIUrl":"https://doi.org/10.1145/2912152.2912154","url":null,"abstract":"The growing interest in being able to apply Big Data techniques to scientific data generated using HPC simulations led to the question of whether this is achievable on the same HPC platform, and if so, what is the performance that can be obtained on these systems. The motivation behind this approach is twofold: scientific datasets are often very large, and would take a long time to transfer to external Big Data clusters; furthermore, the ability to perform live analysis on the data as it is being generated on the HPC platform can be crucial to many scientific applications. Using as case-study a Hadoop-based application that analyzes Molecular Dynamics simulations data on the same HPC platform on which it was produced, we present our experiences with performing Big Data analysis on an HPC system. This work also describes the challenges that one has to deal with when performing Hadoop-based computations on scientific data on HPC platforms: data storage, data formats, ingesting data in Hadoop, optimizing the deployment to overcome the limitations of the HPC environment. Our work shows in a first phase that such an instantiation of Big Data analysis on an HPC system is both relevant and feasible; in a second phase, we greatly improve the performance by efficient configuration of HPC resources and tuning of the application. Our findings can be shared as best practices and recommendations in the context of the convergence of the HPC and Big Data environments.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115084451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Rethinking High Performance Computing Platforms: Challenges, Opportunities and Recommendations 重新思考高性能计算平台:挑战、机遇和建议

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI: 10.1145/2912152.2912155

Ole Weidner, M. Atkinson, A. Barker, Rosa Filgueira Vicente

{"title":"Rethinking High Performance Computing Platforms: Challenges, Opportunities and Recommendations","authors":"Ole Weidner, M. Atkinson, A. Barker, Rosa Filgueira Vicente","doi":"10.1145/2912152.2912155","DOIUrl":"https://doi.org/10.1145/2912152.2912155","url":null,"abstract":"A growing number of \"second generation\" high-performance computing applications with heterogeneous, dynamic and data-intensive properties have an extended set of requirements, which cover application deployment, resource allocation, -control, and I/O scheduling. These requirements are not met by the current production HPC platform models and policies. This results in a loss of opportunity, productivity and innovation for new computational methods and tools. It also decreases effective system utilization for platform providers due to unsupervised workarounds and \"rogue'\" resource management strategies implemented in application space. In this paper we critically discuss the dominant HPC platform model and describe the challenges it creates for second generation applications because of its asymmetric resource view, interfaces and software deployment policies. We present an extended, more symmetric and application-centric platform model that adds decentralized deployment, introspection, bidirectional control and information flow and more comprehensive resource scheduling. We describe cHPC: an early prototype of a non-disruptive implementation based on Linux Containers (LXC). It can operate alongside existing batch queuing systems and exposes a symmetric platform API without interfering with existing applications and usage modes. We see our approach as a viable, incremental next step in HPC platform evolution that benefits applications and platform providers alike. To demonstrate this further, we layout out a roadmap for future research and experimental evaluation.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125160477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Session details: Keynote Address 会议详情:主题演讲

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/3260990

E. Yildirim

引用次数: 0

Session details: Session I 会话详细信息:会话1

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/3260991

E. Yildirim

引用次数: 0