{"title":"Fair Online Scheduling for Selfish Jobs on Heterogeneous Machines","authors":"Sungjin Im, Janardhan Kulkarni","doi":"10.1145/2935764.2935773","DOIUrl":"https://doi.org/10.1145/2935764.2935773","url":null,"abstract":"Scheduling jobs on multiple machines has numerous applications and has been a central topic of research in the scheduling literature. Recently, much progress has been made particularly in online scheduling with the development of powerful analysis tools. In this line of wok a centralized scheduler typically dispatches jobs to machines to exploit the given resources the best to achieve the best system performance which is measured by a certain global scheduling objective. While this approach has been very successful in attacking scheduling problems of growing complexity, the underlying assumption that jobs follow a centralized scheduler may not be realistic in certain scheduling settings. In this paper we initiate the study of online scheduling for selfish jobs in the presence of multiple machines. Selfish behavior of jobs is a common aspect observed in the absence of a centralized scheduler. We explore this question in the unrelated machines setting, arguably one of the most general multiple machine models. In this model each job can have a completely different processing time on each machine. Motivated by several practical scenarios, we assume that when a job arrives it chooses the machine that completes the job the earliest i.e. minimizes the flow time of the job. The goal is to design a local scheduling algorithm on each machine with the goal of minimizing the total (weighted) flow time. We show that the algorithm Smoothed Latest Arrival Processor Sharing, which was introduced in a recent work by Im et al. [27,28], yields an O(1 / ε2)-competitive schedule when given (1 + ε) speed. We also extend our result to minimize total flow-time plus energy consumed. To show this result we establish several interesting properties of the algorithm which could be of potential use for other scheduling problems.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123303841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Power of Migration in Online Machine Minimization","authors":"Lin Chen, Nicole Megow, Kevin Schewior","doi":"10.1145/2935764.2935786","DOIUrl":"https://doi.org/10.1145/2935764.2935786","url":null,"abstract":"In this paper we investigate the power of migration in online scheduling on multiple parallel machines. The problem is to schedule preemptable jobs with release dates and deadlines on a minimum number of machines. We show that migration, that is, allowing that a preempted job is continued on a different machine, has a huge impact on the performance of a schedule. More precisely, let m be the number of machines required by a migratory solution; then the increase in the number of machines when disallowing migration is unbounded in m. This complements and strongly contrasts previous results on variants of this problem. In both the offline variant and a model allowing extra speed, the power of migration is limited as the increase of number of machines and speed, respectively, can be bounded by a small constant. In this paper, we also derive the first non-trivial bounds on the competitive ratio for non-migratory online scheduling to minimize the number of machines without extra speed. We show that in general no online algorithm can achieve a competitive ratio of f(m), for any function f, and give a lower bound of Omega(log n). For agreeable instances and instances with \"loose\" jobs, we give O(1)-competitive algorithms and, for laminar instances, we derive an O(log m)-competitive algorithm.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124952585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Election vs. Selection: How Much Advice is Needed to Find the Largest Node in a Graph?","authors":"Avery Miller, A. Pelc","doi":"10.1145/2935764.2935772","DOIUrl":"https://doi.org/10.1145/2935764.2935772","url":null,"abstract":"Finding the node with the largest label in a labeled network, modeled as an undirected connected graph, is one of the fundamental problems in distributed computing. This is the way in which leader election is usually solved. We consider two distinct tasks in which the largest-labeled node is found deterministically. In selection, this node has to output 1 and all other nodes have to output 0. In election, the other nodes must additionally learn the largest label (everybody has to know who is the elected leader). Our aim is to compare the difficulty of these two seemingly similar tasks executed under stringent running time constraints. The measure of difficulty is the amount of information that nodes of the network must initially possess, in order to solve the given task in an imposed amount of time. Following the standard framework of algorithms with advice, this information (a single binary string) is provided to all nodes at the start by an oracle knowing the entire graph. The length of this string is called the size of advice. The paradigm of algorithms with advice has a far-reaching importance in the realm of network algorithms. Lower bounds on the size of advice give us impossibility results based strictly on the amount of initial knowledge outlined in a model's description. This more general approach should be contrasted with traditional results that focus on specific kinds of information available to nodes, such as the size, diameter, or maximum node degree. Consider the class of n-node graphs with any diameter diam ≤ D, for some integer D. If time is larger than diam, then both tasks can be solved without advice. For the task of election, we show that if time is smaller than $diam$, then the optimal size of advice is Θ(log n), and if time is exactly diam, then the optimal size of advice is Θ(log D). For the task of selection, the situation changes dramatically, even within the class of rings. Indeed, for the class of rings, we show that, if time is O(diamε), for any ε < 1, then the optimal size of advice is Θ(log D), and, if time is Θ(diam) (and at most diam) then this optimal size is Θ(log log D). Thus there is an exponential increase of difficulty (measured by the size of advice) between selection in time O(diamε), for any ε < 1, and selection in time Θ(diam). As for the comparison between election and selection, our results show that, perhaps surprisingly, while for small time, the difficulty of these two tasks on rings is similar, for time Θ(diam) the difficulty of election (measured by the size of advice) is exponentially larger than that of selection.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122263316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concurrent Search Data Structures Can Be Blocking and Practically Wait-Free","authors":"Tudor David, R. Guerraoui","doi":"10.1145/2935764.2935774","DOIUrl":"https://doi.org/10.1145/2935764.2935774","url":null,"abstract":"We argue that there is virtually no practical situation in which one should seek a \"theoretically wait-free\" algorithm at the expense of a state-of-the-art blocking algorithm in the case of search data structures: blocking algorithms are simple, fast, and can be made \"practically wait-free\". We draw this conclusion based on the most exhaustive study of blocking search data structures to date. We consider (a) different search data structures of different sizes, (b) numerous uniform and non-uniform workloads, representative of a wide range of practical scenarios, with different percentages of update operations, (c) with and without delayed threads, (d) on different hardware technologies, including processors providing HTM instructions. We explain our claim that blocking search data structures are practically wait-free through an analogy with the birthday paradox, revealing that, in state-of-the-art algorithms implementing such data structures, the probability of conflicts is extremely small. When conflicts occur as a result of context switches and interrupts, we show that HTM-based locks enable blocking algorithms to cope with them.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128760043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Churn- and DoS-resistant Overlay Networks Based on Network Reconfiguration","authors":"Maximilian Drees, R. Gmyr, C. Scheideler","doi":"10.1145/2935764.2935783","DOIUrl":"https://doi.org/10.1145/2935764.2935783","url":null,"abstract":"We present three robust overlay networks: First, we present a network that organizes the nodes into an expander and is resistant to even massive adversarial churn. Second, we develop a network based on the hypercube that maintains connectivity under adversarial DoS-attacks. For the DoS-attacks we use the notion of a Ω(log log n)-late adversary which only has access to topological information that is at least Ω(log log n) rounds old. Finally, we develop a network that combines both churn- and DoS-resistance. The networks gain their robustness through constant network reconfiguration, i.e., the topology of the networks changes constantly. Our reconfiguration algorithms are based on node sampling primitives for expanders and hypercubes that allow each node to sample a logarithmic number of nodes uniformly at random in O(log log n) communication rounds. These primitives are specific to overlay networks and their optimal runtime represents an exponential improvement over known techniques. Our results have a wide range of applications, for example in the area of scalable and robust peer-to-peer systems.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"40 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129090193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brief Announcement: Preserving Happens-before in Persistent Memory","authors":"Joseph Izraelevitz, H. Mendes, M. Scott","doi":"10.1145/2935764.2935810","DOIUrl":"https://doi.org/10.1145/2935764.2935810","url":null,"abstract":"Nonvolatile, byte-addressable memory (NVM) will soon be commercially available, but registers and caches are expected to remain transient on most machines. Without careful management, the data preserved in the wake of a crash are likely to be inconsistent and thus unusable. Previous work has explored the semantics of instructions used to push the contents of cache to NVM. These semantics comprise a \"memory persistency model,\" analogous to a traditional \"memory consistency model.\" In this brief announcement we introduce \"explicit epoch persistency\", a memory persistency model that captures the current and expected semantics of Intel x86 and ARM v8 persistent memory instructions. We also present a construction that augments any data-race-free program (for release consistency or any stronger memory model) in such a way that preserved data are guaranteed to represent a consistent cut in the happens-before graph of the program's execution.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115421565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Brief Announcement: Relaxed Byzantine Vector Consensus","authors":"Zhuolun Xiang, N. Vaidya","doi":"10.1145/2935764.2935808","DOIUrl":"https://doi.org/10.1145/2935764.2935808","url":null,"abstract":"Byzantine vector consensus requires that non-faulty processes reach agreement on a decision (or output) that is in the convex hull of the inputs at the non-faulty processes. Recent work has shown that, for n processes with up to f Byzantine failures, when the inputs are d-dimensional vectors of reals, n ≥ max{(3f+1,(d+1)f+1)} is the tight bound for synchronous systems, and n≥(d+2)f+1 is tight for approximate consensus in asynchronous systems. Due to the dependence of the lower bound on vector dimension d, the number of processes necessary becomes large when the vector dimension is large. With the hope of reducing the lower bound on n, we propose relaxed versions of Byzantine vector consensus: k-relaxed Byzantine vector consensus and (δ,p)-relaxed Byzantine vector consensus. k-relaxed consensus only requires consensus for projections of inputs on every subset of k dimensions. (δ,p)-relaxed consensus requires that the output be within distance δ of the convex hull of the non-faulty inputs, where distance is defined using the Lp-norm. An input-dependent δ allows the distance from the non-faulty convex hull to be dependent on the maximum distance between the non-faulty inputs. We show that for k-relaxed consensus and (δ,p)-relaxed consensus with constant δ≥0, the bound on n is identical to the bound stated above for the original vector consensus problem. On the other hand, when δ depends on the inputs, we show that the bound on n is smaller when d ≥ 3. Input-dependent δ may be of interest in practice -- in essence, input-dependent δ scales with the spread of the inputs.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114422488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallelism in Randomized Incremental Algorithms","authors":"G. Blelloch, Yan Gu, Julian Shun, Yihan Sun","doi":"10.1145/2935764.2935766","DOIUrl":"https://doi.org/10.1145/2935764.2935766","url":null,"abstract":"In this paper we show that most sequential randomized incremental algorithms are in fact parallel. We consider several random incremental algorithms including algorithms for comparison sorting and Delaunay triangulation; linear programming, closest pair, and smallest enclosing disk in constant dimensions; as well as least-element lists and strongly connected components on graphs. We analyze the dependence between iterations in an algorithm, and show that the dependence structure is shallow for all of the algorithms, implying high parallelism. We identify three types of dependences found in the algorithms studied and present a framework for analyzing each type of algorithm. Using the framework gives work-efficient polylogarithmic-depth parallel algorithms for most of the problems that we study. Some of these algorithms are straightforward (e.g., sorting and linear programming), while others are more novel and require more effort to obtain the desired bounds (e.g., Delaunay triangulation and strongly connected components). The most surprising of these results is for planar Delaunay triangulation for which the incremental approach is by far the most commonly used in practice, but for which it was not previously known whether it is theoretically efficient in parallel.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126401200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothy Carpenter, F. Rastello, P. Sadayappan, Anastasios Sidiropoulos
{"title":"Brief Announcement: Approximating the I/O Complexity of One-Shot Red-Blue Pebbling","authors":"Timothy Carpenter, F. Rastello, P. Sadayappan, Anastasios Sidiropoulos","doi":"10.1145/2935764.2935807","DOIUrl":"https://doi.org/10.1145/2935764.2935807","url":null,"abstract":"Red-blue pebbling is a model of computation that captures the complexity of I/O operations in systems with external memory access. We focus on one-shot pebbling strategies, that is without re-computation. Prior work on this model has focused on finding upper and lower bounds on the I/O complexity of certain families of graphs. We give a polynomial-time bi-criteria approximation algorithm for this problem for graphs with bounded out-degree. More precisely, given a n-vertex DAG that admits a pebbling strategy with R red pebbles and I/O complexity opt, our algorithm outputs a strategy using O(R ⋅ log3/2 n) red pebbles, and I/O complexity O(opt ⋅ log3/2 n). We further extend our result to the generalization of red-blue pebble games that correspond to multi-level memory hierarchies. Finally, we complement our theoretical analysis with an experimental evaluation of our algorithm for red-blue pebbling.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121040264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"General Profit Scheduling and the Power of Migration on Heterogeneous Machines","authors":"Sungjin Im, Benjamin Moseley","doi":"10.1145/2935764.2935771","DOIUrl":"https://doi.org/10.1145/2935764.2935771","url":null,"abstract":"In this paper we consider the power of migration in heterogeneous machines settings and general profit scheduling. We begin by showing that on related machines or on related machines with restricted assignment that any migratory algorithm can be simulated by a non-migratory algorithm given 1+ε speed augmentation and O(1/ε) and O(1/ε2) machine augmentation, respectively, for any 0 < ε ≤ 1. Similar results were only known in the case of identical machines and our results effectively show that migration does not give too much additional power to an algorithm, even in heterogeneous environments. Our results are constructive and can be computed efficiently in the offline setting. We complement our result by showing that there exists migratory schedules on related machines which require Ω(1/ε) machine augmentation with (1+ε)-speed to be simulated by any non-migratory scheduler for any 0 < ε ≤ 1/2, showing that machine augmentation without speed augmentation is insufficient for a non-migratory scheduler to simulate a migratory scheduler. We then use these results to study general profit scheduling where a set of n jobs arrive over time online and every job i has a function gi(t) specifying the profit of completing job i at time t. The goal of the schedule is to maximize the total profit obtained. We give a (1+ε)-speed O(1/ε2)-competitive algorithm in the unrelated machines setting for any ε >0 when compared against a non-migratory adversary. Previous results were only known in the identical machines setting. As an example of the usefulness of the previous results on migration, they with the results on genial profit scheduling give a (1+ε)-speed O(1/ε4)-competitive algorithm for general profit scheduling when comparing against a migratory algorithm on related machines with restricted assignment for any ε >0.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134223812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}