Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing最新文献

筛选
英文 中文
Session details: Session II 会话详情:会话II
E. Yildirim
{"title":"Session details: Session II","authors":"E. Yildirim","doi":"10.1145/3260992","DOIUrl":"https://doi.org/10.1145/3260992","url":null,"abstract":"","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133844953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIDI: A Scalable in-Memory Density-based Index for Spatial Databases 基于内存密度的可扩展空间数据库索引
D. Nguyen, K. Doan, T. Pham
{"title":"SIDI: A Scalable in-Memory Density-based Index for Spatial Databases","authors":"D. Nguyen, K. Doan, T. Pham","doi":"10.1145/2912152.2912158","DOIUrl":"https://doi.org/10.1145/2912152.2912158","url":null,"abstract":"With wide-spread use of location-based services, spatial data is becoming popular. As the data is usually huge in volume and continuously arriving to the storage in real-time, designing systems for efficiently storing this type of data is challenging. Two major issues that make building such system become complicated are the skewed distribution of data and the need of scaling the storage on multiple machines. In this paper, we propose a novel scalable in-memory density-based index for spatial databases. The key principle underlying our design is the exploitation of the stable spatial distribution of the datasets to deploy a simple but efficient index structure. We used information extracted from data in the past to split the entire space into independent pieces with similar density to ensure load-balancing and scalability. Experimental results show that the proposed solution scales well in distributed environment and outperforms common indexes in many cases.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128889966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Efficient and Scalable Workflows for Genomic Analyses 高效和可扩展的基因组分析工作流程
Subho Sankar Banerjee, A. Athreya, L. S. Mainzer, C. Jongeneel, Wen-mei W. Hwu, Z. Kalbarczyk, R. Iyer
{"title":"Efficient and Scalable Workflows for Genomic Analyses","authors":"Subho Sankar Banerjee, A. Athreya, L. S. Mainzer, C. Jongeneel, Wen-mei W. Hwu, Z. Kalbarczyk, R. Iyer","doi":"10.1145/2912152.2912156","DOIUrl":"https://doi.org/10.1145/2912152.2912156","url":null,"abstract":"Recent growth in the volume of DNA sequence data and associated computational costs of extracting meaningful information warrants the need for efficient computational systems at-scale. In this work, we propose the Illinois Genomics Execution Environment (IGen), a framework for efficient and scalable genome analyses. The design philosophy of IGen is based on algorithmic analysis and extensive measurements on compute- and data-intensive genomic analyses workflows (such as variant discovery and genotyping analysis) executed on high-performance and cloud computing infrastructures. IGen leverages the advantages of existing designs and proposes new software improvements to overcome the ine ciencies we observe in our measurements. Based on these composite improvements, we demonstrate that IGen is able to accelerate the alignment from 13.1 hours to 10.8 hours (1.2x) and the variant from 10.1 hours to 1.25 hours (8x) calling on a single node, and its modular design scales e ciently in a parallel computing environment.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115674451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows 面向数据密集型现场科学工作流的持久数据暂存服务
Melissa Romanus, Fan Zhang, Tong Jin, Qian Sun, H. Bui, M. Parashar, J. Choi, Saloman Janhunen, R. Hager, S. Klasky, Choong-Seock Chang, I. Rodero
{"title":"Persistent Data Staging Services for Data Intensive In-situ Scientific Workflows","authors":"Melissa Romanus, Fan Zhang, Tong Jin, Qian Sun, H. Bui, M. Parashar, J. Choi, Saloman Janhunen, R. Hager, S. Klasky, Choong-Seock Chang, I. Rodero","doi":"10.1145/2912152.2912157","DOIUrl":"https://doi.org/10.1145/2912152.2912157","url":null,"abstract":"Scientific simulation workflows executing on very large scale computing systems are essential modalities for scientific investigation. The increasing scales and resolution of these simulations provide new opportunities for accurately modeling complex natural and engineered phenomena. However, the increasing complexity necessitates managing, transporting, and processing unprecedented amounts of data, and as a result, researchers are increasingly exploring data-staging and in-situ workflows to reduce data movement and data-related overheads. However, as these workflows become more dynamic in their structures and behaviors, data staging and in-situ solutions must evolve to support new requirements. In this paper, we explore how the service-oriented concept can be applied to extreme-scale in-situ workflows. Specifically, we explore persistent data staging as a service and present the design and implementation of DataSpaces as a Service, a service-oriented data staging framework. We use a dynamically coupled fusion simulation workflow to illustrate the capabilities of this framework and evaluate its performance and scalability.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128582518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Minimising the Execution of Unknown Bag-of-Task Jobs with Deadlines on the Cloud 最大限度地减少在云中执行具有最后期限的未知任务袋作业
Long Thai, B. Varghese, A. Barker
{"title":"Minimising the Execution of Unknown Bag-of-Task Jobs with Deadlines on the Cloud","authors":"Long Thai, B. Varghese, A. Barker","doi":"10.1145/2912152.2912153","DOIUrl":"https://doi.org/10.1145/2912152.2912153","url":null,"abstract":"Scheduling jobs with deadlines, each of which defines the latest time that a job must be completed, can be challenging on the cloud due to the incurred costs and unpredictable performance. This problem is further complicated when there is not enough information to effectively schedule a job such that its deadline is satisfied, and the cost is minimised. In this paper, we present an approach to schedule jobs, whose performance are unknown before execution, with deadlines on the cloud. By performing a sampling phase to collect the necessary information about those jobs, our approach is able to deliver the scheduling decision within 10% cost and 16% violation rate when compared to the ideal setting, which has complete knowledge about each of the jobs from the beginning. Finally, our proposed algorithm outperforms existing approaches, which use a fixed amount of resources by reducing the violation cost by at least two times.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134406610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Towards Convergence of Extreme Computing and Big Data Centers 走向极限计算与大数据中心的融合
S. Matsuoka
{"title":"Towards Convergence of Extreme Computing and Big Data Centers","authors":"S. Matsuoka","doi":"10.1145/2912152.2912159","DOIUrl":"https://doi.org/10.1145/2912152.2912159","url":null,"abstract":"Rapid growth in the use cases and demands for extreme computing and huge data processing is leading to convergence of the two infrastructures. Tokyo Tech.'s TSUBAME3.0, a 2017 addition to the highly successful TSUBAME2.5, will aim to deploy a series of innovative technologies, including ultra-efficient liquid cooling and power control, petabytes of non-volatile memory, as well as low cost Petabit-class interconnect. To address the challenges of such technology adoption, proper system architecture, software stack, and algorithm must be desgined and developed; these are being addressed by several of our ongoing research projects as well as prototypes, such as the TSUBAME-KFC/DL prototype which became #1 in the world in power efficiency on the Green500 twice in a row, the Billion-way Resiliency project that is investigating effective methods for future resilient supercomputers, as well as the Extreme Big Data (EBD) project which is looking at co-design development of convergent system stack given future extreme data and computing workloads. We are already successful in developing various algorithms and sottware substrates to manipulate big data elements directly on extreme supercomputers, such as graphs, tables (sort), trees, files, etc. and in fact became #1 in the world on the Graph 500 twice including the latest Nov. 2015 version. Our recent focus is also how to ssupport new workloads in categorizing big data represented by deep learning, and there we are collaborating with several partners such as DENSO to improve the scalability and predictability of such workloads; recent trial allowed scalablity to utilize 1146 GPUs for the entire week for a CNN workload. For TSUBAME3 and 2.5 combined we espect to increase such capabilities to over 80 Petaflops in early 2017, or 7 times faster than the K computer.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121233765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experiences with Performing MapReduce Analysis of Scientific Data on HPC Platforms 在HPC平台上执行MapReduce科学数据分析的经验
Diana Moise
{"title":"Experiences with Performing MapReduce Analysis of Scientific Data on HPC Platforms","authors":"Diana Moise","doi":"10.1145/2912152.2912154","DOIUrl":"https://doi.org/10.1145/2912152.2912154","url":null,"abstract":"The growing interest in being able to apply Big Data techniques to scientific data generated using HPC simulations led to the question of whether this is achievable on the same HPC platform, and if so, what is the performance that can be obtained on these systems. The motivation behind this approach is twofold: scientific datasets are often very large, and would take a long time to transfer to external Big Data clusters; furthermore, the ability to perform live analysis on the data as it is being generated on the HPC platform can be crucial to many scientific applications. Using as case-study a Hadoop-based application that analyzes Molecular Dynamics simulations data on the same HPC platform on which it was produced, we present our experiences with performing Big Data analysis on an HPC system. This work also describes the challenges that one has to deal with when performing Hadoop-based computations on scientific data on HPC platforms: data storage, data formats, ingesting data in Hadoop, optimizing the deployment to overcome the limitations of the HPC environment. Our work shows in a first phase that such an instantiation of Big Data analysis on an HPC system is both relevant and feasible; in a second phase, we greatly improve the performance by efficient configuration of HPC resources and tuning of the application. Our findings can be shared as best practices and recommendations in the context of the convergence of the HPC and Big Data environments.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115084451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Rethinking High Performance Computing Platforms: Challenges, Opportunities and Recommendations 重新思考高性能计算平台:挑战、机遇和建议
Ole Weidner, M. Atkinson, A. Barker, Rosa Filgueira Vicente
{"title":"Rethinking High Performance Computing Platforms: Challenges, Opportunities and Recommendations","authors":"Ole Weidner, M. Atkinson, A. Barker, Rosa Filgueira Vicente","doi":"10.1145/2912152.2912155","DOIUrl":"https://doi.org/10.1145/2912152.2912155","url":null,"abstract":"A growing number of \"second generation\" high-performance computing applications with heterogeneous, dynamic and data-intensive properties have an extended set of requirements, which cover application deployment, resource allocation, -control, and I/O scheduling. These requirements are not met by the current production HPC platform models and policies. This results in a loss of opportunity, productivity and innovation for new computational methods and tools. It also decreases effective system utilization for platform providers due to unsupervised workarounds and \"rogue'\" resource management strategies implemented in application space. In this paper we critically discuss the dominant HPC platform model and describe the challenges it creates for second generation applications because of its asymmetric resource view, interfaces and software deployment policies. We present an extended, more symmetric and application-centric platform model that adds decentralized deployment, introspection, bidirectional control and information flow and more comprehensive resource scheduling. We describe cHPC: an early prototype of a non-disruptive implementation based on Linux Containers (LXC). It can operate alongside existing batch queuing systems and exposes a symmetric platform API without interfering with existing applications and usage modes. We see our approach as a viable, incremental next step in HPC platform evolution that benefits applications and platform providers alike. To demonstrate this further, we layout out a roadmap for future research and experimental evaluation.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125160477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Session details: Keynote Address 会议详情:主题演讲
E. Yildirim
{"title":"Session details: Keynote Address","authors":"E. Yildirim","doi":"10.1145/3260990","DOIUrl":"https://doi.org/10.1145/3260990","url":null,"abstract":"","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115716138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Session details: Session I 会话详细信息:会话1
E. Yildirim
{"title":"Session details: Session I","authors":"E. Yildirim","doi":"10.1145/3260991","DOIUrl":"https://doi.org/10.1145/3260991","url":null,"abstract":"","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114709732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信