{"title":"许多顺序迭代算法可以并行且(接近)高效","authors":"Zheqi Shen, Zijin Wan, Yan Gu, Yihan Sun","doi":"10.1145/3490148.3538574","DOIUrl":null,"url":null,"abstract":"Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve O (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover, we implement many of them for experimental studies. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"260 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient\",\"authors\":\"Zheqi Shen, Zijin Wan, Yan Gu, Yihan Sun\",\"doi\":\"10.1145/3490148.3538574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve O (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover, we implement many of them for experimental studies. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.\",\"PeriodicalId\":112865,\"journal\":{\"name\":\"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures\",\"volume\":\"260 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3490148.3538574\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3490148.3538574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Many Sequential Iterative Algorithms Can Be Parallel and (Nearly) Work-efficient
Some recent papers showed that many sequential iterative algorithms can be directly parallelized, by identifying the dependences between the input objects. This approach yields many simple and practical parallel algorithms, but there are still challenges to achieve work-efficiency and high-parallelism. Work-efficiency means that the number of operations is asymptotically the same as the best sequential solution. This can be hard for certain problems where the number of dependences between objects is asymptotically more than optimal sequential work, and we cannot even afford the cost to generate them. To achieve high-parallelism, we always want it to process as many objects as possible in parallel. The goal is to achieve O (D) span for a problem with the deepest dependence length D. We refer to this property as round-efficiency. This paper presents work-efficient and round-efficient algorithms for a variety of classic problems and propose general approaches to do so. To efficiently parallelize many sequential iterative algorithms, we propose the phase-parallel framework. The framework assigns a rank to each object and processes the objects based on the order of their ranks. All objects with the same rank can be processed in parallel. To enable work-efficiency and high parallelism, we use two types of general techniques. Type 1 algorithms aim to use range queries to extract all objects with the same rank to avoid evaluating all the dependences. We discuss activity selection, and Dijkstra's algorithm using Type 1 framework. Type 2 algorithms aim to wake up an object when the last object it depends on is finished. We discuss activity selection, longest increasing subsequence (LIS), greedy maximal independent set (MIS), and many other algorithms using Type 2 framework. All of our algorithms are (nearly) work-efficient and round-efficient, and some of them (e.g., LIS) are the first to achieve the both. Many of them improve the previous best bounds. Moreover, we implement many of them for experimental studies. On inputs with reasonable dependence depth, our algorithms are highly parallelized and significantly outperform their sequential counterparts.