{"title":"拓扑感知并行连接","authors":"Xiao Hu, Paraschos Koutris","doi":"10.1145/3651598","DOIUrl":null,"url":null,"abstract":"We study the design and analysis of parallel join algorithms in a topology-aware computational model. In this model, the network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred between the two endpoints and the link bandwidth. The computation proceeds in synchronous rounds and the cost of each round is measured as the maximum cost over all the edges in the network. Our main result is an asymptotically optimal join algorithm over symmetric tree topologies. The algorithm generalizes prior topology-aware protocols for set intersection and cartesian product to a binary join over an arbitrary input distribution with possible data skew.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":" 12","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Topology-aware Parallel Joins\",\"authors\":\"Xiao Hu, Paraschos Koutris\",\"doi\":\"10.1145/3651598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the design and analysis of parallel join algorithms in a topology-aware computational model. In this model, the network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred between the two endpoints and the link bandwidth. The computation proceeds in synchronous rounds and the cost of each round is measured as the maximum cost over all the edges in the network. Our main result is an asymptotically optimal join algorithm over symmetric tree topologies. The algorithm generalizes prior topology-aware protocols for set intersection and cartesian product to a binary join over an arbitrary input distribution with possible data skew.\",\"PeriodicalId\":498157,\"journal\":{\"name\":\"Proceedings of the ACM on Management of Data\",\"volume\":\" 12\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM on Management of Data\",\"FirstCategoryId\":\"0\",\"ListUrlMain\":\"https://doi.org/10.1145/3651598\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Management of Data","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.1145/3651598","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We study the design and analysis of parallel join algorithms in a topology-aware computational model. In this model, the network is modeled as a directed graph, where each edge is associated with a cost function that depends on the data transferred between the two endpoints and the link bandwidth. The computation proceeds in synchronous rounds and the cost of each round is measured as the maximum cost over all the edges in the network. Our main result is an asymptotically optimal join algorithm over symmetric tree topologies. The algorithm generalizes prior topology-aware protocols for set intersection and cartesian product to a binary join over an arbitrary input distribution with possible data skew.