{"title":"对现代分布式流媒体平台进行基准测试","authors":"Shilei Qian, Gang Wu, Jie Huang, Tathagata Das","doi":"10.1109/ICIT.2016.7474816","DOIUrl":null,"url":null,"abstract":"The prevalence of big data technology has generated increasing demands in large-scale streaming data processing. However, for certain tasks it is still challenging to appropriately select a platform due to the diversity of choices and the complexity of configurations. This paper focuses on benchmarking some principal streaming platforms. We achieve our goals on StreamBench, a streaming benchmark tool based on which we introduce proper modifications and extensions. We then accomplish performance comparisons among different big data platforms, including Apache Spark, Apache Storm and Apache Samza. In terms of performance criteria, we consider both computational capability and fault-tolerance ability. Finally, we give a summary on some key knobs for performance tuning as well as on hardware utilization.","PeriodicalId":116715,"journal":{"name":"2016 IEEE International Conference on Industrial Technology (ICIT)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":"{\"title\":\"Benchmarking modern distributed streaming platforms\",\"authors\":\"Shilei Qian, Gang Wu, Jie Huang, Tathagata Das\",\"doi\":\"10.1109/ICIT.2016.7474816\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The prevalence of big data technology has generated increasing demands in large-scale streaming data processing. However, for certain tasks it is still challenging to appropriately select a platform due to the diversity of choices and the complexity of configurations. This paper focuses on benchmarking some principal streaming platforms. We achieve our goals on StreamBench, a streaming benchmark tool based on which we introduce proper modifications and extensions. We then accomplish performance comparisons among different big data platforms, including Apache Spark, Apache Storm and Apache Samza. In terms of performance criteria, we consider both computational capability and fault-tolerance ability. Finally, we give a summary on some key knobs for performance tuning as well as on hardware utilization.\",\"PeriodicalId\":116715,\"journal\":{\"name\":\"2016 IEEE International Conference on Industrial Technology (ICIT)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Industrial Technology (ICIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIT.2016.7474816\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Industrial Technology (ICIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2016.7474816","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Benchmarking modern distributed streaming platforms
The prevalence of big data technology has generated increasing demands in large-scale streaming data processing. However, for certain tasks it is still challenging to appropriately select a platform due to the diversity of choices and the complexity of configurations. This paper focuses on benchmarking some principal streaming platforms. We achieve our goals on StreamBench, a streaming benchmark tool based on which we introduce proper modifications and extensions. We then accomplish performance comparisons among different big data platforms, including Apache Spark, Apache Storm and Apache Samza. In terms of performance criteria, we consider both computational capability and fault-tolerance ability. Finally, we give a summary on some key knobs for performance tuning as well as on hardware utilization.