{"title":"大数据系统性能建模的最新趋势","authors":"V. Apte","doi":"10.1145/3053600.3053621","DOIUrl":null,"url":null,"abstract":"With the advent of big data through social media and continuous creation of digital footprints through various mobile devices, special-purpose programming models were developed that would make it easy to write programs to process such data. MapReduce and its Hadoop implementation is one of the most popular platforms for writing such programs. The MapReduce framework involves a \"map\" phase where various tasks work in parallel for intermediate processing of data and a \"reduce\" phase where again various tasks work in parallel to extract information from this processed data. Performance modeling of such systems will need different approaches than are used for traditional multi-threaded multi-core systems supporting Web applications, primarily because the dependencies and synchronization required between various tasks is not easily expressible using standard queuing network models. In this talk we will review work done by researchers to address this modeling problem. The work done encompasses first-principles calculations of execution time completion, queuing network models, and finally, simulation. We will review these efforts as well as highlight opportunities for further work in this area.","PeriodicalId":115833,"journal":{"name":"Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recent Trends in Performance Modeling of Big Data Systems\",\"authors\":\"V. Apte\",\"doi\":\"10.1145/3053600.3053621\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the advent of big data through social media and continuous creation of digital footprints through various mobile devices, special-purpose programming models were developed that would make it easy to write programs to process such data. MapReduce and its Hadoop implementation is one of the most popular platforms for writing such programs. The MapReduce framework involves a \\\"map\\\" phase where various tasks work in parallel for intermediate processing of data and a \\\"reduce\\\" phase where again various tasks work in parallel to extract information from this processed data. Performance modeling of such systems will need different approaches than are used for traditional multi-threaded multi-core systems supporting Web applications, primarily because the dependencies and synchronization required between various tasks is not easily expressible using standard queuing network models. In this talk we will review work done by researchers to address this modeling problem. The work done encompasses first-principles calculations of execution time completion, queuing network models, and finally, simulation. We will review these efforts as well as highlight opportunities for further work in this area.\",\"PeriodicalId\":115833,\"journal\":{\"name\":\"Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3053600.3053621\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3053600.3053621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Recent Trends in Performance Modeling of Big Data Systems
With the advent of big data through social media and continuous creation of digital footprints through various mobile devices, special-purpose programming models were developed that would make it easy to write programs to process such data. MapReduce and its Hadoop implementation is one of the most popular platforms for writing such programs. The MapReduce framework involves a "map" phase where various tasks work in parallel for intermediate processing of data and a "reduce" phase where again various tasks work in parallel to extract information from this processed data. Performance modeling of such systems will need different approaches than are used for traditional multi-threaded multi-core systems supporting Web applications, primarily because the dependencies and synchronization required between various tasks is not easily expressible using standard queuing network models. In this talk we will review work done by researchers to address this modeling problem. The work done encompasses first-principles calculations of execution time completion, queuing network models, and finally, simulation. We will review these efforts as well as highlight opportunities for further work in this area.