{"title":"Experimentation as a Tool for the Performance Evaluation of Big Data Systems","authors":"A. Apon","doi":"10.1145/2694730.2694734","DOIUrl":"https://doi.org/10.1145/2694730.2694734","url":null,"abstract":"The complex big data systems of today are difficult, if not impossible, to model analytically. The challenges of these distributed and parallel data processing systems include heterogeneous network communication, a mix of storage, memory, and computing devices, and common failures of communication and devices. Particular challenges with big data systems include the variety and volume of data that place previously unseen stresses on distributed computing systems. Experimentation using production-quality hardware and software and realistic data is required to understand system tradeoffs. At the same time, experimental evaluation has challenges, including access to hardware resources at scale, robust workload characterization, data characterization, configuration management of software and systems, and sometimes insidious optimization issues around the mix of software stacks or hardware/software resource allocation. In this talk we present a number of the research challenges when experimentation is used as a tool for the performance evaluation of big data systems, some approaches to solutions, and open questions for this area.","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126009930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John Klein, I. Gorton, Neil A. Ernst, P. Donohoe, Kim Pham, Chrisjan Matser
{"title":"Performance Evaluation of NoSQL Databases: A Case Study","authors":"John Klein, I. Gorton, Neil A. Ernst, P. Donohoe, Kim Pham, Chrisjan Matser","doi":"10.1145/2694730.2694731","DOIUrl":"https://doi.org/10.1145/2694730.2694731","url":null,"abstract":"The choice of a particular NoSQL database imposes a specific distributed software architecture and data model, and is a major determinant of the overall system throughput. NoSQL database performance is in turn strongly influenced by how well the data model and query capabilities fit the application use cases, and so system-specific testing and characterization is required. This paper presents a method and the results of a study that selected among three NoSQL databases for a large, distributed healthcare organization. While the method and study considered consistency, availability, and partition tolerance (CAP) tradeoffs, and other quality attributes that influence the selection decision, this paper reports on the performance evaluation method and results. In our testing, a typical workload and configuration produced throughput that varied from 225 to 3200 operations per second between database products, while read operation latency varied by a factor of 5 and write latency by a factor of 4 (with the highest throughput product delivering the highest latency). We also found that achieving strong consistency reduced throughput by 10-25% compared to eventual consistency.","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131945712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynote Address","authors":"Rekha Singhal","doi":"10.1145/3246731","DOIUrl":"https://doi.org/10.1145/3246731","url":null,"abstract":"","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134244948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Big Data Processing on Modern Clusters","authors":"D. Panda","doi":"10.1145/2694730.2694733","DOIUrl":"https://doi.org/10.1145/2694730.2694733","url":null,"abstract":"Modern clusters are having multi-/many-core architectures, high-performance rdma-enabled interconnects and SSD-based storage devices. Hadoop framework is extensively being used these days for Big Data processing. Spark framework is emerging for real-time analytics. Similarly, Memcached is being used in data centers with Web 2.0 environment. This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark and Memcached will be presented. Performance benefits of these designs on various cluster configurations will be shown. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these middleware.","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132872660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Memory Sensitive SPEC CPU2006 Integer Benchmarks for Big Data Benchmarking","authors":"K. Hurt, E. John","doi":"10.1145/2694730.2694732","DOIUrl":"https://doi.org/10.1145/2694730.2694732","url":null,"abstract":"Benchmarking for Big Data is done at the system level, but with processors now being designed specifically for Cloud Computing and Big Data applications, optimization can now be done at the node level. The purpose of this work is to analyze three SPEC CPU2006 Integer benchmarks (libquantum, h264ref and hmmer) that were deemed \"highly memory sensitive\" in other works to determine their potential as Big Data processor benchmarks. Program characteristics like instruction count, instruction mix, locality, and memory footprint were analyzed. Through this preliminary analysis, these benchmarks were determined to be potential Big Data node-level benchmarks, but more analysis will have to be done in future work.","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132211028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Paper Session","authors":"Dheeraj Chahal","doi":"10.1145/3246732","DOIUrl":"https://doi.org/10.1145/3246732","url":null,"abstract":"","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133972934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","authors":"Rekha Singhal, Dheeraj Chahal","doi":"10.1145/2694730","DOIUrl":"https://doi.org/10.1145/2694730","url":null,"abstract":"It is our great pleasure to welcome you to the 2015 ACM Workshop on Performance Analysis of Big Data Systems -- PABS'15 in conjunction with ICPE2015. \u0000 \u0000The main objective of the workshop is to discuss the performance challenges imposed by big data systems and the different state-of-the-art solutions proposed to overcome these challenges. The workshop aims at providing a platform for scientific researchers, academicians and practitioners to discuss techniques, models, benchmarks, tools and experiences while dealing with performance issues in big data systems. \u0000 \u0000The program committee reviewed 4 and accepted 2 full technical papers with acceptance rate as 50%. \u0000 \u0000We welcome attendees to attend the keynote, invited talk and paper presentations. These valuable and insightful talks can and will guide us to a better understanding of the future: \u0000Accelerating Big Data Processing on Modern Clusters, Prof. D.K. Panda (Ohio State University, USA) \u0000Experimentation as a Tool for the Performance Evaluation of Big Data Systems, Prof. Amy W. Apon (Clemson University, USA)","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133504500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}