{"title":"Session details: SESSION 5","authors":"K. Censor-Hillel","doi":"10.1145/3257329","DOIUrl":"https://doi.org/10.1145/3257329","url":null,"abstract":"","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123133079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vincent Chau, Minming Li, Samuel McCauley, Kai Wang
{"title":"Minimizing Total Weighted Flow Time with Calibrations","authors":"Vincent Chau, Minming Li, Samuel McCauley, Kai Wang","doi":"10.1145/3087556.3087573","DOIUrl":"https://doi.org/10.1145/3087556.3087573","url":null,"abstract":"In sensitive applications, machines need to be periodically calibrated to ensure that they run to high standards. Creating an efficient schedule on these machines requires attention to two metrics: ensuring good throughput of the jobs, and ensuring that not too much cost is spent on machine calibration. In this paper we examine flow time as a metric for scheduling with calibrations. While previous papers guaranteed that jobs would meet a certain deadline, we relax that constraint to a tradeoff: we want to balance how long the average job waits with how many costly calibrations we need to perform. One advantage of this metric is that it allows for online schedules (where an algorithm is unaware of a job until it arrives). Thus we give two types of results. We give an efficient offline algorithm which gives the optimal schedule on a single machine for a set of jobs which are known ahead of time. We also give online algorithms which adapt to jobs as they come. Our online algorithms are constant competitive for unweighted jobs on single or multiple machines, and constant-competitive for weighted jobs on a single machine.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116967536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brief Announcement: STAR (Space-Time Adaptive and Reductive) Algorithms for Dynamic Programming Recurrences with more than O(1) Dependency","authors":"Yuan Tang, Shiyi Wang","doi":"10.1145/3087556.3087593","DOIUrl":"https://doi.org/10.1145/3087556.3087593","url":null,"abstract":"It's important to hit a space-time balance for a real-world algorithm to achieve high performance on modern shared-memory multi-core and many-core systems. However, a large class of dynamic programs with more than O(1) dependency achieved optimality either in space or time, but not both. In the literature, the problem is known as the fundamental space-time tradeoff. We propose the notion of \"Processor-Adaptiveness.\" In contrast to the prior \"Processor-Awareness\", our approach does not partition statically the problem space to the processor grid, but uses the processor count P to just upper bound the space and cache requirement in a cache-oblivious fashion. In the meantime, our processor-adaptive algorithms enjoy the full benefits of \"dynamic load-balance\", which is a key to achieving satisfactory speedup on a shared-memory system, especially when the problem dimension n is reasonably larger than P. By utilizing the \"busy-leaves\" property of runtime scheduler and a program managed memory pool that combines the advantages of stack and heap, we show that our STAR (Space-Time Adaptive and Reductive) technique can help these dynamic programs to achieve sublinear time bounds while keeping to be asymptotically work-, space-, and cache-optimal. The key achievement of this paper is to obtain the first sublinear O(n3/4 log n) time and optimal O(n3) work GAP algorithm; If we further bound the space and cache requirement of the algorithm to be asymptotically optimal, there will be a factor of P increase in time bound without sacrificing the work bound. If P = o(n1/4 / log n), the time bound stays sublinear and may be a better tradeoff between time and space requirements in practice.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"483 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114279220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tight Bounds for Clairvoyant Dynamic Bin Packing","authors":"Y. Azar, Danny Vainstein","doi":"10.1145/3087556.3087570","DOIUrl":"https://doi.org/10.1145/3087556.3087570","url":null,"abstract":"In this paper we focus on the Clairvoyant Dynamic Bin Packing (DBP) problem, which extends the classical online bin packing problem in that items arrive and depart over time and the departure time of an item is known upon its arrival. The problem naturally arises when handling cloud-based networks. We focus specifically on the MinUsageTime cost function which aims to minimize the overall usage time of all bins that are opened during the packing process. Earlier work has shown a O(frac{log mu}{log log mu}) upper bound where mu is defined as the ratio between the maximal and minimal durations of all items. We improve the upper bound by giving an O(sqrt{log mu})-competitive algorithm. We then provide a matching lower bound of Omega(sqrt{log mu}) on the competitive ratio of any online algorithm, thus closing the gap with regards to this problem. We then focus on what we call the class of aligned inputs and give a O(log log mu)-competitive algorithm for this case, beating the lower bound of the general case by an exponential factor. Surprisingly enough, the analysis of our algorithm that we present, is closely related to various properties of binary strings.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114295575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brief Announcement: Towards Fault-Tolerant Bin Packing for Online Cloud Resource Allocation","authors":"Chuanyou Li, Xueyan Tang","doi":"10.1145/3087556.3087596","DOIUrl":"https://doi.org/10.1145/3087556.3087596","url":null,"abstract":"We consider an online fault-tolerant bin packing problem that models the reliable resource allocation in cloud-based systems. In this problem, any feasible packing algorithm must satisfy an exclusion constraint and a space constraint. The exclusion constraint is generalized from the fault-tolerance requirement and the space constraint comes from the capacity planning. The target of bin packing is to minimize the number of bins used. We first derive a lower bound on the number of bins needed by any feasible packing algorithm. Then we study two heuristic algorithms mirroring and shifting. The mirroring algorithm has a low utilization of the bin capacity. Compared with the mirroring algorithm, the shifting algorithm requires fewer numbers of bins. However, in online packing, the process of opening bins by the shifting algorithm is not smooth. It turns out that even for packing a few items, the shifting algorithm needs to quickly open a large number of bins. We therefore propose a new heuristic algorithm named mixing which can gradually open new bins for incoming items. We prove that the mixing algorithm is feasible and show that it balances the number of bins used and the process of opening bins.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129353214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brief Announcement: Using Multi-Level Parallelism and 2-3 Cuckoo Filters for Set Intersection Queries and Sparse Boolean Matrix Multiplication","authors":"D. Eppstein, M. Goodrich","doi":"10.1145/3087556.3087599","DOIUrl":"https://doi.org/10.1145/3087556.3087599","url":null,"abstract":"We use multi-level parallelism and a new type of data structures, known as 2-3 cuckoo filters, to answer set intersection queries faster than previous methods, with applications to improved sparse Boolean matrix multiplication.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114372154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brief Announcement: Extending Transactional Memory with Atomic Deferral","authors":"Tingzhe Zhou, Victor Luchangco, Michael F. Spear","doi":"10.1145/3087556.3087600","DOIUrl":"https://doi.org/10.1145/3087556.3087600","url":null,"abstract":"Atomic deferral is a language-level mechanism for transactional memory (TM) that enables programmers to move output and long-running operations out of a transaction's body without sacrificing serializability: the deferred operation appears to execute as part of its parent transaction, even though it does not make use of TM. We introduce the first implementation of atomic deferral, based on transaction-friendly locks; describe enhancements to its API; and demonstrate its effectiveness. Our experiments show that atomic deferral is useful for its original purpose of moving output operations out of transactions, and also for moving expensive library calls out of transactions. The result is a significant improvement in performance for the PARSEC dedup kernel, for both software and hardware TM systems.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126244843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Soheil Behnezhad, M. Derakhshan, Hossein Esfandiari, E. Tan, Hadi Yami
{"title":"Brief Announcement: Graph Matching in Massive Datasets","authors":"Soheil Behnezhad, M. Derakhshan, Hossein Esfandiari, E. Tan, Hadi Yami","doi":"10.1145/3087556.3087601","DOIUrl":"https://doi.org/10.1145/3087556.3087601","url":null,"abstract":"In this paper we consider the maximum matching problem in large bipartite graphs. We present a new algorithm that finds the maximum matching in a few iterations of a novel edge sampling technique. This algorithm can be implemented in big data settings such as streaming setting and MapReduce setting, where each iteration of the algorithm maps to one pass over the stream, or one MapReduce round of computation, respectively. We prove that our algorithm provides a 1-eps approximate solution to the maximum matching in 1/eps rounds which improves the prior work in terms of the number of passes/rounds. Our algorithm works even better when we run it on real datasets and finds the exact maximum matching in 4 to 8 rounds while sampling only about %1 of the total edges.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124231716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: SESSION 8","authors":"B. Patt-Shamir","doi":"10.1145/3257332","DOIUrl":"https://doi.org/10.1145/3257332","url":null,"abstract":"","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129893212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Energy Conservation in Data Centers","authors":"S. Albers","doi":"10.1145/3087556.3087560","DOIUrl":"https://doi.org/10.1145/3087556.3087560","url":null,"abstract":"We formulate and study an optimization problem that arises in the energy management of data centers and, more generally, multiprocessor environments. Data centers host a large number of heterogeneous servers. Each server has an active state and several standby/sleep states with individual power consumption rates. The demand for computing capacity varies over time. Idle servers may be transitioned to low-power modes so as to rightsize the pool of active servers. The goal is to find a state transition schedule for the servers that minimizes the total energy consumed. On a small scale the same problem arises in multi-core architectures with heterogeneous processors on a chip. One has to determine active and idle periods for the cores so as to guarantee a certain service and minimize the consumed energy. For this power/capacity management problem, we develop two main results. We use the terminology of the data center setting. First, we investigate the scenario that each server has two states, i.e. an active state and a sleep state. We show that an optimal solution, minimizing energy consumption, can be computed in polynomial time by a combinatorial algorithm. The algorithm resorts to a single-commodity min-cost flow computation. Second, we study the general scenario that each server has an active state and multiple standby/sleep states. We devise a tau-approximation algorithm that relies on a two-commodity min-cost flow computation. Here tau is the number of different server types. A data center has a large collection of machines but only a relatively small number of different server architectures. Moreover, in the optimization one can assign servers with comparable energy consumption to the same class. Technically, both of our algorithms involve non-trivial flow modification procedures. In particular, given a fractional two-commodity flow, our algorithm executes advanced rounding and flow packing routines.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116755308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}