{"title":"Experimentation as a Tool for the Performance Evaluation of Big Data Systems","authors":"A. Apon","doi":"10.1145/2694730.2694734","DOIUrl":null,"url":null,"abstract":"The complex big data systems of today are difficult, if not impossible, to model analytically. The challenges of these distributed and parallel data processing systems include heterogeneous network communication, a mix of storage, memory, and computing devices, and common failures of communication and devices. Particular challenges with big data systems include the variety and volume of data that place previously unseen stresses on distributed computing systems. Experimentation using production-quality hardware and software and realistic data is required to understand system tradeoffs. At the same time, experimental evaluation has challenges, including access to hardware resources at scale, robust workload characterization, data characterization, configuration management of software and systems, and sometimes insidious optimization issues around the mix of software stacks or hardware/software resource allocation. In this talk we present a number of the research challenges when experimentation is used as a tool for the performance evaluation of big data systems, some approaches to solutions, and open questions for this area.","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2694730.2694734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The complex big data systems of today are difficult, if not impossible, to model analytically. The challenges of these distributed and parallel data processing systems include heterogeneous network communication, a mix of storage, memory, and computing devices, and common failures of communication and devices. Particular challenges with big data systems include the variety and volume of data that place previously unseen stresses on distributed computing systems. Experimentation using production-quality hardware and software and realistic data is required to understand system tradeoffs. At the same time, experimental evaluation has challenges, including access to hardware resources at scale, robust workload characterization, data characterization, configuration management of software and systems, and sometimes insidious optimization issues around the mix of software stacks or hardware/software resource allocation. In this talk we present a number of the research challenges when experimentation is used as a tool for the performance evaluation of big data systems, some approaches to solutions, and open questions for this area.