{"title":"MapReduce算法","authors":"J. Ullman","doi":"10.1145/2778865.2778866","DOIUrl":null,"url":null,"abstract":"We begin with a sketch of how MapReduce works and how MapReduce algorithms differ from general parallel algorithms. While algorithm analysis usually centers on the serial or parallel running time of the algorithms that solve a given problem, in the MapReduce world, the critical issue is a tradeoff between interprocessor communication and the parallel running time. We examine a fundamental problem, in which the output depends on comparison of all pairs of inputs (the \"all-pairs\" problem), and show matching upper and lower bounds for the communication/time tradeoff. Finally, we consider special cases of all-pairs, where only a subset of the pairs of inputs are of interest; an example is the problem of similarity join.","PeriodicalId":116839,"journal":{"name":"Proceedings of the 2nd IKDD Conference on Data Sciences","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"MapReduce Algorithms\",\"authors\":\"J. Ullman\",\"doi\":\"10.1145/2778865.2778866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We begin with a sketch of how MapReduce works and how MapReduce algorithms differ from general parallel algorithms. While algorithm analysis usually centers on the serial or parallel running time of the algorithms that solve a given problem, in the MapReduce world, the critical issue is a tradeoff between interprocessor communication and the parallel running time. We examine a fundamental problem, in which the output depends on comparison of all pairs of inputs (the \\\"all-pairs\\\" problem), and show matching upper and lower bounds for the communication/time tradeoff. Finally, we consider special cases of all-pairs, where only a subset of the pairs of inputs are of interest; an example is the problem of similarity join.\",\"PeriodicalId\":116839,\"journal\":{\"name\":\"Proceedings of the 2nd IKDD Conference on Data Sciences\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2nd IKDD Conference on Data Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2778865.2778866\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2nd IKDD Conference on Data Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2778865.2778866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We begin with a sketch of how MapReduce works and how MapReduce algorithms differ from general parallel algorithms. While algorithm analysis usually centers on the serial or parallel running time of the algorithms that solve a given problem, in the MapReduce world, the critical issue is a tradeoff between interprocessor communication and the parallel running time. We examine a fundamental problem, in which the output depends on comparison of all pairs of inputs (the "all-pairs" problem), and show matching upper and lower bounds for the communication/time tradeoff. Finally, we consider special cases of all-pairs, where only a subset of the pairs of inputs are of interest; an example is the problem of similarity join.