{"title":"加快现代集群大数据处理","authors":"D. Panda","doi":"10.1145/2694730.2694733","DOIUrl":null,"url":null,"abstract":"Modern clusters are having multi-/many-core architectures, high-performance rdma-enabled interconnects and SSD-based storage devices. Hadoop framework is extensively being used these days for Big Data processing. Spark framework is emerging for real-time analytics. Similarly, Memcached is being used in data centers with Web 2.0 environment. This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark and Memcached will be presented. Performance benefits of these designs on various cluster configurations will be shown. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these middleware.","PeriodicalId":298926,"journal":{"name":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Accelerating Big Data Processing on Modern Clusters\",\"authors\":\"D. Panda\",\"doi\":\"10.1145/2694730.2694733\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern clusters are having multi-/many-core architectures, high-performance rdma-enabled interconnects and SSD-based storage devices. Hadoop framework is extensively being used these days for Big Data processing. Spark framework is emerging for real-time analytics. Similarly, Memcached is being used in data centers with Web 2.0 environment. This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark and Memcached will be presented. Performance benefits of these designs on various cluster configurations will be shown. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these middleware.\",\"PeriodicalId\":298926,\"journal\":{\"name\":\"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2694730.2694733\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2694730.2694733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerating Big Data Processing on Modern Clusters
Modern clusters are having multi-/many-core architectures, high-performance rdma-enabled interconnects and SSD-based storage devices. Hadoop framework is extensively being used these days for Big Data processing. Spark framework is emerging for real-time analytics. Similarly, Memcached is being used in data centers with Web 2.0 environment. This talk will provide an overview of challenges in accelerating Hadoop, Spark and Memcached on modern clusters. An overview of RDMA-based designs for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark and Memcached will be presented. Performance benefits of these designs on various cluster configurations will be shown. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these middleware.