{"title":"并行连接算法的性能比较","authors":"J. Wolf, D. Dias, Philip S. Yu, John Turek","doi":"10.1109/PDIS.1991.183070","DOIUrl":null,"url":null,"abstract":"The authors recently (1990, 1991) described two new join algorithms designed to address the data skew problem. These algorithms were based, respectively, on the traditional sort merge and hash join algorithms, and employed techniques borrowed from mathematical optimization theory. The current paper proposes significant improvements to both algorithms, increasing their effectiveness while simultaneously decreasing their execution times. It then focuses on the comparative performance of the improved algorithms and their more conventional sort merge and hash counterparts. The latter two are perfectly good algorithms except that they fail to deal with data skew. Both I/O- and CPU-bound configurations were examined. The new algorithms outperform their more conventional counterparts in the presence of just about any skew at all, dramatically so in cases of high skew.<<ETX>>","PeriodicalId":210800,"journal":{"name":"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"1991-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Comparative performance of parallel join algorithms\",\"authors\":\"J. Wolf, D. Dias, Philip S. Yu, John Turek\",\"doi\":\"10.1109/PDIS.1991.183070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The authors recently (1990, 1991) described two new join algorithms designed to address the data skew problem. These algorithms were based, respectively, on the traditional sort merge and hash join algorithms, and employed techniques borrowed from mathematical optimization theory. The current paper proposes significant improvements to both algorithms, increasing their effectiveness while simultaneously decreasing their execution times. It then focuses on the comparative performance of the improved algorithms and their more conventional sort merge and hash counterparts. The latter two are perfectly good algorithms except that they fail to deal with data skew. Both I/O- and CPU-bound configurations were examined. The new algorithms outperform their more conventional counterparts in the presence of just about any skew at all, dramatically so in cases of high skew.<<ETX>>\",\"PeriodicalId\":210800,\"journal\":{\"name\":\"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDIS.1991.183070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDIS.1991.183070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative performance of parallel join algorithms
The authors recently (1990, 1991) described two new join algorithms designed to address the data skew problem. These algorithms were based, respectively, on the traditional sort merge and hash join algorithms, and employed techniques borrowed from mathematical optimization theory. The current paper proposes significant improvements to both algorithms, increasing their effectiveness while simultaneously decreasing their execution times. It then focuses on the comparative performance of the improved algorithms and their more conventional sort merge and hash counterparts. The latter two are perfectly good algorithms except that they fail to deal with data skew. Both I/O- and CPU-bound configurations were examined. The new algorithms outperform their more conventional counterparts in the presence of just about any skew at all, dramatically so in cases of high skew.<>