Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures最新文献_第5页

Online Flexible Job Scheduling for Minimum Span 最小跨度的在线柔性作业调度

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-07-24 DOI: 10.1145/3087556.3087562

Runtian Ren, Xueyan Tang

{"title":"Online Flexible Job Scheduling for Minimum Span","authors":"Runtian Ren, Xueyan Tang","doi":"10.1145/3087556.3087562","DOIUrl":"https://doi.org/10.1145/3087556.3087562","url":null,"abstract":"In this paper, we study an online Flexible Job Scheduling (FJS) problem. The input of the problem is a set of jobs, each having an arrival time, a starting deadline and a processing length. Each job has to be started by the scheduler between its arrival and its starting deadline. Once started, the job runs for a period of the processing length without interruption. The target is to minimize the span of all the jobs --- the time duration in which at least one job is running. We study online FJS under both the non-clairvoyant and clairvoyant settings. In the non-clairvoyant setting, the processing length of each job is not known for scheduling purposes. We first establish a lower bound of μ on the competitive ratio of any deterministic online scheduler, where μ is the max/min job processing length ratio. Then, we propose two O(μ)-competitive schedulers: Batch and Batch+. The Batch+ scheduler is proved to have a tight competitive ratio of (μ+1). In the clairvoyant setting, the processing length of each job is known at its arrival and can be used for scheduling purposes. We establish a lower bound of (√5+1)/2 on the competitive ratio of any deterministic online scheduler, and propose two O(1)-competitive schedulers: Classify-by-Duration Batch+ and Profit. The Profit scheduler can achieve a competitive ratio of 4+2√2. Our work lays the foundation for extending several online job scheduling problems in cloud and energy-efficient computing to jobs that have laxity in starting.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124963676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Sharing is Caring: Multiprocessor Scheduling with a Sharable Resource 共享即关怀:共享资源的多处理器调度

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-07-24 DOI: 10.1145/3087556.3087578

Peter Kling, Alexander Mäcker, Sören Riechers, Alexander Skopalik

{"title":"Sharing is Caring: Multiprocessor Scheduling with a Sharable Resource","authors":"Peter Kling, Alexander Mäcker, Sören Riechers, Alexander Skopalik","doi":"10.1145/3087556.3087578","DOIUrl":"https://doi.org/10.1145/3087556.3087578","url":null,"abstract":"We consider a scheduling problem on m identical processors sharing an arbitrarily divisible resource. In addition to assigning jobs to processors, the scheduler must distribute the resource among the processors (e.g., for three processors in shares of 20%, 15%, and 65%) and adjust this distribution over time. Each job j comes with a size pj ∈ R and a resource requirement rj > 0. Jobs do not benefit when receiving a share larger than rj of the resource. But providing them with a fraction of the resource requirement causes a linear decrease in the processing efficiency. We seek a (non-preemptive) job and resource assignment minimizing the makespan. Our main result is an efficient approximation algorithm which achieves an approximation ratio of 2 + 1/(m-2). It can be improved to an (asymptotic) ratio of 1 + 1/(m-1) if all jobs have unit size. Our algorithms also imply new results for a well-known bin packing problem with splittable items and a restricted number of allowed item parts per bin. Based upon the above solution, we also derive an approximation algorithm with similar guarantees for a setting in which we introduce so-called tasks each containing several jobs and where we are interested in the average completion time of tasks (a task is completed when all its jobs are completed).","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"798 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131532435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Swarm-based Incast Congestion Control in Datacenters Serving Web Applications 基于群播的Web应用数据中心拥塞控制

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-07-24 DOI: 10.1145/3087556.3087559

Haoyu Wang, Haiying Shen, Guoxin Liu

{"title":"Swarm-based Incast Congestion Control in Datacenters Serving Web Applications","authors":"Haoyu Wang, Haiying Shen, Guoxin Liu","doi":"10.1145/3087556.3087559","DOIUrl":"https://doi.org/10.1145/3087556.3087559","url":null,"abstract":"In Web applications served by datacenter nowadays, the incast congestion at the front-end server seriously degrades the data request latency performance due to the vast data transmissions from a large number data servers for a data request in a short time. Previous incast congestion control methods usually consider the direct data transmissions from data servers to the front-end server, which makes it difficult to control the sending speed or adjust workloads due to the transient transmission of only a few data objects from each data server. In this paper, we propose a Swarm-based Incast Congestion Control (SICC) system. SICC forms all target data servers of one request in the same rack into a swarm. In each swarm, a data server (called hub) is selected to forward all data objects to the front-end server, so that the number of data servers concurrently connected to the front-end server is reduced, which avoids the incast congestion. Also, the continuous data transmission from hubs to the front-end server facilitates the development of other strategies to further control the incast congestion. To fully utilize the bandwidth, SICC uses a two-level data transmission speed control method to adjust the data transmission speeds of hubs. A query redirection method further reduces the request latency by balancing the transmission remaining times between hubs. Our experiments in simulation and on a real cluster demonstrate that SICC outperforms other incast control methods in improving throughput and reducing the data request latency.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127063522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Session details: SESSION 4 会话详细信息:Session 4

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-07-24 DOI: 10.1145/3257328

S. Albers

引用次数: 0

Distributed Detection of Cycles 循环的分布式检测

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-06-13 DOI: 10.1145/3087556.3087571

P. Fraigniaud, D. Olivetti

引用次数: 23

Is Our Model for Contention Resolution Wrong?: Confronting the Cost of Collisions 我们的争用解决模型错了吗?:面对碰撞的代价

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-05-25 DOI: 10.1145/3087556.3087584

William C. Anderton, Maxwell Young

{"title":"Is Our Model for Contention Resolution Wrong?: Confronting the Cost of Collisions","authors":"William C. Anderton, Maxwell Young","doi":"10.1145/3087556.3087584","DOIUrl":"https://doi.org/10.1145/3087556.3087584","url":null,"abstract":"Randomized binary exponential backoff (BEB) is a popular algorithm for coordinating access to a shared channel. With an operational history exceeding four decades, BEB is currently an important component of several wireless standards. Despite this track record, prior theoretical results indicate that under bursty traffic (1) BEB yields poor makespan and (2) superior algorithms are possible. To date, the degree to which these findings manifest in practice has not been resolved. To address this issue, we examine one of the strongest cases against BEB: n packets that simultaneously begin contending for the wireless channel. Using Network Simulator 3, we compare against more recent algorithms that are inspired by BEB, but whose makespan guarantees are superior. Surprisingly, we discover that these newer algorithms significantly underperform. Through further investigation, we identify as the culprit a flawed but common abstraction regarding the cost of collisions. Our experimental results are complemented by analytical arguments that the number of collisions -- and not solely makespan -- is an important metric to optimize. We argue that these findings have implications for the design of contention-resolution algorithms.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133615878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers: Extended Abstract 一般调度器下多线程计算的缓存丢失代价:扩展摘要

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-05-23 DOI: 10.1145/3087556.3087572

R. Cole, V. Ramachandran

引用次数: 4

Randomized Composable Coresets for Matching and Vertex Cover 匹配和顶点覆盖的随机可组合核心集

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-05-23 DOI: 10.1145/3087556.3087581

Sepehr Assadi, S. Khanna

{"title":"Randomized Composable Coresets for Matching and Vertex Cover","authors":"Sepehr Assadi, S. Khanna","doi":"10.1145/3087556.3087581","DOIUrl":"https://doi.org/10.1145/3087556.3087581","url":null,"abstract":"A common approach for designing scalable algorithms for massive data sets is to distribute the computation across, say k, machines and process the data using limited communication between them. A particularly appealing framework here is the simultaneous communication model whereby each machine constructs a small representative summary of its own data and one obtains an approximate/exact solution from the union of the representative summaries. If the representative summaries needed for a problem are small, then this results in a communication-efficient and emph{round-optimal} (requiring essentially no interaction between the machines) protocol. Some well-known examples of techniques for creating summaries include sampling, linear sketching, and composable coresets. These techniques have been successfully used to design communication efficient solutions for many fundamental graph problems. However, two prominent problems are notably absent from the list of successes, namely, the maximum matching problem and the minimum vertex cover problem. Indeed, it was shown recently that for both these problems, even achieving a modest approximation factor of polylog{(n)} requires using representative summaries of size widetilde{Omega}(n^2) i.e. essentially no better summary exists than each machine simply sending its entire input graph. The main insight of our work is that the intractability of matching and vertex cover in the simultaneous communication model is inherently connected to an adversarial partitioning of the underlying graph across machines. We show that when the underlying graph is randomly partitioned across machines, both these problems admit emph{randomized composable coresets} of size widetilde{O}(n) that yield an widetilde{O}(1)-approximate solutionfootnote{Here and throughout the paper, we use Ot(cdot) notation to suppress polylog{(n)} factors, where n is the number of vertices in the graph. In other words, a small subgraph of the input graph at each machine can be identified as its representative summary and the final answer then is obtained by simply running any maximum matching or minimum vertex cover algorithm on these combined subgraphs. This results in an Õ(1)-approximation simultaneous protocol for these problems with Õ(nk) total communication when the input is randomly partitioned across k machines. We also prove our results are optimal in a very strong sense: we not only rule out existence of smaller randomized composable coresets for these problems but in fact show that our Ot(nk) bound for total communication is optimal for em any simultaneous communication protocol (i.e. not only for randomized coresets) for these two problems. Finally, by a standard application of composable coresets, our results also imply MapReduce algorithms with the same approximation guarantee in one or two rounds of communication, improving the previous best known round complexity for these problems.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132401587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs 无向图动态DFS的近最优并行算法

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-05-10 DOI: 10.1145/3087556.3087576

Shahbaz Khan

{"title":"Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs","authors":"Shahbaz Khan","doi":"10.1145/3087556.3087576","DOIUrl":"https://doi.org/10.1145/3087556.3087576","url":null,"abstract":"Depth first search (DFS) tree is a fundamental data structure for solving various graph problems. The classical algorithm [SIAMCOMP74] for building a DFS tree requires O(m+n) time for a given undirected graph G having n vertices and m edges. Recently, Baswana et al. [SODA16] presented a simple algorithm for updating the DFS tree of an undirected graph after an edge/vertex update in O (n) time. However, their algorithm is strictly sequential. We present an algorithm achieving similar bounds, that can be adopted easily to the parallel environment. In the parallel environment, a DFS tree can be computed from scratch using O(m) processors in expected O (1) time [SICOMP90] on an EREW PRAM, whereas the best deterministic algorithm takes O (√n) time [SIAMCOMP90,JAL93] on a CRCW PRAM. Our algorithm can be used to develop optimal (upto polylog n factors) deterministic algorithms for maintaining fully dynamic DFS and fault tolerant DFS, of an undirected graph. 1- Parallel Fully Dynamic DFS - Given any arbitrary online sequence of vertex or edge updates, we can maintain a DFS tree of an undirected graph in O (1) time per update using m processors on an EREW PRAM. 2- Parallel Fault tolerant DFS - An undirected graph can be preprocessed to build a data structure of size O(m) such that for a set of k updates (where k is constant) in the graph, a DFS tree of the updated graph can be computed in O (1) time using n processors on an EREW PRAM. For constant k, this is also work optimal (upto polylog n factors) Moreover, our fully dynamic DFS algorithm provides, in a seamless manner, nearly optimal (upto polylog n factors) algorithms for maintaining a DFS tree in the semi-streaming environment and a restricted distributed model. These are the first parallel, semi-streaming and distributed algorithms for maintaining a DFS tree in the dynamic setting.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129979556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Distributed Partial Clustering 分布式部分聚类

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2017-03-05 DOI: 10.1145/3087556.3087568

S. Guha, Yi Li, Qin Zhang

{"title":"Distributed Partial Clustering","authors":"S. Guha, Yi Li, Qin Zhang","doi":"10.1145/3087556.3087568","DOIUrl":"https://doi.org/10.1145/3087556.3087568","url":null,"abstract":"Recent years have witnessed an increasing popularity of algorithm design for distributed data, largely due to the fact that massive datasets are often collected and stored in different locations. In the distributed setting communication typically dominates the query processing time. Thus it becomes crucial to design communication efficient algorithms for queries on distributed data. Simultaneously, it has been widely recognized that partial optimizations, where we are allowed to disregard a small part of the data, provide us significantly better solutions. The motivation for disregarded points often arise from noise and other phenomena that are pervasive in large data scenarios. In this paper we focus on partial clustering problems, k-center, k-median and k-means, in the distributed model, and provide algorithms with communication sublinear of the input size. As a consequence we develop the first algorithms for the partial k-median and means objectives that run in subquadratic running time. We also initiate the study of distributed algorithms for clustering uncertain data, where each data point can possibly fall into multiple locations under certain probability distribution.","PeriodicalId":162994,"journal":{"name":"Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125744488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29