International Symposium on Design and Implementation of Symbolic Computation Systems最新文献

筛选
英文 中文
CRUCIBLE: towards unified secure on- and off-line analytics at scale 坩埚:走向统一的安全在线和离线分析的规模
Peter Coetzee, S. Jarvis
{"title":"CRUCIBLE: towards unified secure on- and off-line analytics at scale","authors":"Peter Coetzee, S. Jarvis","doi":"10.1145/2534645.2534649","DOIUrl":"https://doi.org/10.1145/2534645.2534649","url":null,"abstract":"The burgeoning field of data science benefits from the application of a variety of analytic models and techniques to the oft-cited problems of large volume, high velocity data rates, and significant variety in data structure and semantics. Many approaches make use of common analytic techniques in either a streaming or batch processing paradigm.\u0000 This paper presents progress in developing a framework for the analysis of large-scale datasets using both of these pools of techniques in a unified manner. This includes: (1) a Domain Specific Language (DSL) for describing analyses as a set of Communicating Sequential Processes, fully integrated with the Java type system, including an Integrated Development Environment (IDE) and a compiler which builds idiomatic Java; (2) a runtime model for execution of an analytic in both streaming and batch environments; and (3) a novel approach to automated management of cell-level security labels, applied uniformly across all runtimes.\u0000 The paper concludes with a demonstration of the successful use of this system with a sample workload developed in (1), and an analysis of the performance characteristics of each of the runtimes described in (2).","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129340456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SDAFT: a novel scalable data access framework for parallel BLAST SDAFT:一种用于并行BLAST的新型可扩展数据访问框架
Jiangling Yin, Junyao Zhang, Jun Wang, Wu-chun Feng
{"title":"SDAFT: a novel scalable data access framework for parallel BLAST","authors":"Jiangling Yin, Junyao Zhang, Jun Wang, Wu-chun Feng","doi":"10.1145/2534645.2534647","DOIUrl":"https://doi.org/10.1145/2534645.2534647","url":null,"abstract":"To run search tasks in a parallel and load-balanced fashion, existing parallel BLAST schemes such as mpiBLAST introduce a data initialization preparation stage to move database fragments from the shared storage to local cluster nodes. Unfortunately, a quickly growing sequence database becomes too heavy to move in the network in today's big data era.\u0000 In this paper, we develop a Scalable Data Access Framework (SDAFT) to solve the problem. It employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two inter-locked components: 1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and 2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4 to 10 and double the overall execution performance as compared with existing schemes.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"239 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114014638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Novel parallel method for mining frequent patterns on multi-core shared memory systems 多核共享内存系统频繁模式挖掘的一种新的并行方法
Lan Vu, G. Alaghband
{"title":"Novel parallel method for mining frequent patterns on multi-core shared memory systems","authors":"Lan Vu, G. Alaghband","doi":"10.1145/2534645.2534653","DOIUrl":"https://doi.org/10.1145/2534645.2534653","url":null,"abstract":"Frequent pattern mining is an important problem in data mining with many practical applications. Current parallel methods for mining frequent patterns unstably perform for different database types and under-utilize the benefits of multi-core shared memory machines. We present ShaFEM, a novel parallel frequent pattern mining method, to address these issues. Our method can dynamically adapt to the data characteristics to efficiently perform on both sparse and dense databases. Its parallel mining lock free approach minimizes the synchronization needs and maximizes the data independence to enhance the scalability. Its structure lends itself well for dynamic job scheduling resulting in well-balanced load on new multi-core shared memory architectures. We evaluate ShaFEM on a 12-core multi-socket server and find that our method runs 2.1--5.8 times faster than the state-of-the-art parallel method. For some test cases, we have shown that ShaFEM saves 4.9 days and 12.8 hours of execution time over the compared method.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"160 49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129664025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A framework for an in-depth comparison of scale-up and scale-out 一个深入比较按比例扩大和按比例扩大的框架
Michael Sevilla, I. Nassi, Kleoni Ioannidou, S. Brandt, C. Maltzahn
{"title":"A framework for an in-depth comparison of scale-up and scale-out","authors":"Michael Sevilla, I. Nassi, Kleoni Ioannidou, S. Brandt, C. Maltzahn","doi":"10.1145/2534645.2534654","DOIUrl":"https://doi.org/10.1145/2534645.2534654","url":null,"abstract":"When data grows too large, we scale to larger systems, either by scaling out or up. It is understood that scale-out and scale-up have different complexities and bottlenecks but a thorough comparison of the two architectures is challenging because of the diversity of their programming interfaces, their significantly different system environments, and their sensitivity to workload specifics. In this paper, we propose a novel comparison framework based on MapReduce that accounts for the application, its requirements, and its input size by considering input, software, and hardware parameters. Part of this framework requires implementing scale-out properties on scale-up and we discuss the complex trade-offs, interactions, and dependencies of these properties for two specific case studies (word count and sort). This work lays the foundation for future work in quantifying design decisions and in building a system that automatically compares architectures and selects the best one.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129211415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Design of an active storage cluster file system for DAG workflows DAG工作流的主动存储集群文件系统设计
P. Donnelly, D. Thain
{"title":"Design of an active storage cluster file system for DAG workflows","authors":"P. Donnelly, D. Thain","doi":"10.1145/2534645.2534656","DOIUrl":"https://doi.org/10.1145/2534645.2534656","url":null,"abstract":"We present the conceptual design of Confuga, a cluster file system designed to meet the needs of DAG-structured workflows. Today's premier cluster file system Hadoop is commonly used to support large peta-scale data sets on commodity hardware and to exploit active storage through Map-Reduce, a specific workflow pattern. Unfortunately, DAG-structured workflows have very different requirements from Map-Reduce workflows: whole-file access is standard and multiple dependencies are common. Confuga will meet these new requirements by replicating rather than striping files as in Hadoop, by exploiting DAG-structured workflow consistency semantics, and by permitting multiple dependencies in job descriptions. To the end user, Confuga will appear as a drop-in replacement for a batch system and a file system, combined into a single entity that can be invoked by existing workflow managers. In this paper, we describe the design philosophy of Confuga, sketch the major components of the system, and explain how the system will behave under expected workloads.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116165522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Toward a data scalable solution for facilitating discovery of scientific data resources 为促进科学数据资源的发现提供数据可扩展的解决方案
Alan R. Chappell, Sutanay Choudhury, J. Feo, D. Haglin, Alessandro Morari, Sumit Purohit, K. Schuchardt, Antonino Tumeo, Jesse Weaver, Oreste Villa
{"title":"Toward a data scalable solution for facilitating discovery of scientific data resources","authors":"Alan R. Chappell, Sutanay Choudhury, J. Feo, D. Haglin, Alessandro Morari, Sumit Purohit, K. Schuchardt, Antonino Tumeo, Jesse Weaver, Oreste Villa","doi":"10.1145/2534645.2534655","DOIUrl":"https://doi.org/10.1145/2534645.2534655","url":null,"abstract":"Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of \"data scaling\" are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries provided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resources is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system -- SGEM -- designed for answering graph-based queries over large datasets on cluster architectures, and we report early results for our current capability.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121404904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
BDMPI: conquering BigData with small clusters using MPI BDMPI:用MPI征服小集群的大数据
Dominique LaSalle, G. Karypis
{"title":"BDMPI: conquering BigData with small clusters using MPI","authors":"Dominique LaSalle, G. Karypis","doi":"10.1145/2534645.2534652","DOIUrl":"https://doi.org/10.1145/2534645.2534652","url":null,"abstract":"The problem of processing massive amounts of data on clusters with finite amount of memory has become an important problem facing the parallel/distributed computing community. While MapReduce-style technologies provide an effective means for addressing various problems that fit within the MapReduce paradigm, there are many classes of problems for which this paradigm is ill-suited. In this paper we present a runtime system for traditional MPI programs that enables the efficient and transparent disk-based execution of distributed-memory parallel programs. This system, called BDMPI, leverages the semantics of MPI's API to orchestrate the execution of a large number of MPI processes on much fewer compute nodes, so that the running processes maximize the amount of computation that they perform with the data fetched from the disk. BDMPI enables the development of efficient parallel distributed memory disk-based codes without the high engineering and algorithmic complexities associated with multiple levels of blocking. BDMPI achieves significantly better performance than existing technologies on a single node (GraphChi) as well as on a small cluster (Hadoop).","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113965383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On the core affinity and file upload performance of Hadoop Hadoop的核心亲和力和文件上传性能
Joong-Yeon Cho, Hyun-Wook Jin, Min Lee, K. Schwan
{"title":"On the core affinity and file upload performance of Hadoop","authors":"Joong-Yeon Cho, Hyun-Wook Jin, Min Lee, K. Schwan","doi":"10.1145/2534645.2534651","DOIUrl":"https://doi.org/10.1145/2534645.2534651","url":null,"abstract":"The MapReduce programming model is introduced for big-data processing, where the data nodes perform both data storing and computation. Thus, we need to understand different resource requirements of data storing and computation tasks and schedule these efficiently over multi-core processors. The core affinity defines mapping between a set of cores and a given task. The core affinity can be decided based on resource requirements of a task because this largely affects the efficiency of computation, memory, and I/O resource utilization. In this paper, we analyze the impact of core affinity on the file upload performance of Hadoop Distributed File System (HDFS). Our study can provide the insight into the process scheduling issues on big-data processing systems. We also suggest a framework for dynamic core affinity based on our observations and show that a preliminary implementation can improve the throughput more than 40% compared with default Linux system.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124748643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Large data and computation in a hazard map workflow using Hadoop and Neteeza architectures 使用Hadoop和Neteeza架构的危险图工作流中的大数据和计算
S. Rohit, A. Patra, V. Chaudhary
{"title":"Large data and computation in a hazard map workflow using Hadoop and Neteeza architectures","authors":"S. Rohit, A. Patra, V. Chaudhary","doi":"10.1145/2534645.2534648","DOIUrl":"https://doi.org/10.1145/2534645.2534648","url":null,"abstract":"Uncertainty Quantification(UQ) using simulation ensembles leads to twin challenges of managing large amount of data and performing cpu intensive computing. While algorithmic innovations using surrogates, localization and parallelization can make the problem feasible one still has very large data and compute tasks. Such integration of large data analytics and computationally expensive tasks is increasingly common. We present here an approach to solving this problem by using a mix of hardware and a workflow that maps tasks to appropriate hardware. We experiment with two computing environments -- the first is an integration of a Netezza data warehouse appliance and a high performance cluster and the second a hadoop based environment. Our approach is based on segregating the data intensive and compute intensive tasks and assigning the right architecture to each. We present here the computing models and the new schemes in the context of generating probabilistic hazard maps using ensemble runs of the volcanic debris avalanche simulator TITAN2D and UQ methodology.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129098839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信