Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms. SIAM Conference on Applied and Computational Discrete Algorithms (2021 : Online)最新文献
Rigel Galgana, Cengke Shi, A. Greenwald, Takehiro Oyakawa
{"title":"A Dynamic Program for Computing the Joint Cumulative Distribution Function of Order Statistics","authors":"Rigel Galgana, Cengke Shi, A. Greenwald, Takehiro Oyakawa","doi":"10.1137/1.9781611976830.15","DOIUrl":"https://doi.org/10.1137/1.9781611976830.15","url":null,"abstract":"","PeriodicalId":93610,"journal":{"name":"Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms. SIAM Conference on Applied and Computational Discrete Algorithms (2021 : Online)","volume":"1 1","pages":"160-170"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89823503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Lavastida, Benjamin Moseley, R. Ravi, Chenyang Xu
{"title":"Using Predicted Weights for Ad Delivery","authors":"Thomas Lavastida, Benjamin Moseley, R. Ravi, Chenyang Xu","doi":"10.1137/1.9781611976830.3","DOIUrl":"https://doi.org/10.1137/1.9781611976830.3","url":null,"abstract":"We study the performance of a proportional weights algorithm for online capacitated bipartite matching modeling the delivery of impression ads. The algorithm uses predictions on the advertiser nodes to match arriving impression nodes fractionally in proportion to the weights of its neighbors. This paper gives a thorough empirical study of the performance of the algorithm on a data-set of ad impressions from Yahoo! and shows its superior performance compared to natural baselines such as a greedy water-filling algorithm and the ranking algorithm. The proportional weights algorithm has recently received interest in the theoretical literature where it was shown to have strong guarantees beyond the worst-case model of algorithms augmented with predictions. We extend these results to the case where the advertisers' capacities are no longer stationary over time. Additionally, we show the algorithm has near optimal performance in the random-order arrival model when the number of impressions and the optimal matching are sufficiently large.","PeriodicalId":93610,"journal":{"name":"Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms. SIAM Conference on Applied and Computational Discrete Algorithms (2021 : Online)","volume":"14 1","pages":"21-31"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82645082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Parallel Sparse Symmetric Tucker Decomposition for High-Order Tensors","authors":"Shruti Shivakumar, Jiajia Li, R. Kannan, S. Aluru","doi":"10.1137/1.9781611976830.18","DOIUrl":"https://doi.org/10.1137/1.9781611976830.18","url":null,"abstract":"","PeriodicalId":93610,"journal":{"name":"Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms. SIAM Conference on Applied and Computational Discrete Algorithms (2021 : Online)","volume":"115 1","pages":"193-204"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86926999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multidimensional Included and Excluded Sums","authors":"Helen Xu, Sean Fraser, C. Leiserson","doi":"10.1137/1.9781611976830.17","DOIUrl":"https://doi.org/10.1137/1.9781611976830.17","url":null,"abstract":"This paper presents algorithms for the included-sums and excluded-sums problems used by scientific computing applications such as the fast multipole method. These problems are defined in terms of a $d$-dimensional array of $N$ elements and a binary associative operator~$oplus$ on the elements. The included-sum problem requires that the elements within overlapping boxes cornered at each element within the array be reduced using $oplus$. The excluded-sum problem reduces the elements outside each box. The weak versions of these problems assume that the operator $oplus$ has an inverse $ominus$, whereas the strong versions do not require this assumption. In addition to studying existing algorithms to solve these problems, we introduce three new algorithms. The bidirectional box-sum (BDBS) algorithm solves the strong included-sums problem in $Theta(d N)$ time, asymptotically beating the classical summed-area table (SAT) algorithm, which runs in $Theta(2^d N)$ and which only solves the weak version of the problem. Empirically, the BDBS algorithm outperforms the SAT algorithm in higher dimensions by up to $17.1times$. The defn{box-complement} algorithm can solve the strong excluded-sums problem in $Theta(d N)$ time, asymptotically beating the state-of-the-art corners algorithm by Demaine et al., which runs in $Omega(2^d N)$ time. In 3 dimensions the box-complement algorithm empirically outperforms the corners algorithm by about $1.4times$ given similar amounts of space. The weak excluded-sums problem can be solved in $Theta(d N)$ time by the bidirectional box-sum complement (BDBSC) algorithm, which is a trivial extension of the BDBS algorithm. Given an operator inverse $ominus$, BDBSC can beat box-complement by up to a factor of $4$.","PeriodicalId":93610,"journal":{"name":"Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms. SIAM Conference on Applied and Computational Discrete Algorithms (2021 : Online)","volume":"13 1","pages":"182-192"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75474394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver","authors":"Nan Ding, Yang Liu, Samuel Williams, X. Li","doi":"10.1137/1.9781611976830.14","DOIUrl":"https://doi.org/10.1137/1.9781611976830.14","url":null,"abstract":"Sparse triangular solve is used in conjunction with Sparse LU for solving sparse linear systems, either as a direct solver or as a preconditioner. As GPUs have become a firstclass compute citizen, designing an efficient and scalable SpTRSV on multi-GPU HPC systems is imperative. In this paper, we leverage the advantage of GPU-initiated data transfers of NVSHMEM to implement and evaluate a Multi-GPU SpTRSV. We create a novel producer-consumer paradigm to manage the computation and communication in SpTRSV and implement it using two CUDA streams. Our multi-GPU SpTRSV implementation using CUDA streams achieves a 3.7× speedup when using twelve GPUs (two nodes) relative to our implementation on a single GPU, and up to 6.1× compared to cusparse csrsv2() over the range of one to eighteen GPUs. To further explain the observed performance and explore the key features of matrices to estimate the potential performance benefits when using multi-GPU, we extend the critical path model of SpTRSV to GPUs. We demonstrate the ability of our performance model to understand various aspects of performance and performance bottlenecks on multi-GPU and motivate code","PeriodicalId":93610,"journal":{"name":"Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms. SIAM Conference on Applied and Computational Discrete Algorithms (2021 : Online)","volume":"86 1","pages":"147-159"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76816833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Clique Counting and Peeling Algorithms","authors":"Jessica Shi, Laxman Dhulipala, Julian Shun","doi":"10.1137/1.9781611976830.13","DOIUrl":"https://doi.org/10.1137/1.9781611976830.13","url":null,"abstract":"Dense subgraphs capture strong communities in social networks and entities possessing strong interactions in biological networks. In particular, $k$-clique counting and listing have applications in identifying important actors in a graph. However, finding $k$-cliques is computationally expensive, and thus it is important to have fast parallel algorithms. \u0000We present a new parallel algorithm for $k$-clique counting that has polylogarithmic span and is work-efficient with respect to the well-known sequential algorithm for $k$-clique listing by Chiba and Nishizeki. Our algorithm can be extended to support listing and enumeration, and is based on computing low out-degree orientations. We present a new linear-work and polylogarithmic span algorithm for computing such orientations, and new parallel algorithms for producing unbiased estimations of clique counts. Finally, we design new parallel work-efficient algorithms for approximating the $k$-clique densest subgraph. Our first algorithm gives a $1/k$-approximation and is based on iteratively peeling vertices with the lowest clique counts; our algorithm is work-efficient, but we prove that this process is P-complete and hence does not have polylogarithmic span. Our second algorithm gives a $1/(k(1+epsilon))$-approximation, is work-efficient, and has polylogarithmic span. \u0000In addition, we implement these algorithms and propose optimizations. On a 60-core machine, we achieve 13.23-38.99x and 1.19-13.76x self-relative parallel speedup for $k$-clique counting and $k$-clique densest subgraph, respectively. Compared to the state-of-the-art parallel $k$-clique counting algorithms, we achieve a 1.31-9.88x speedup, and compared to existing implementations of $k$-clique densest subgraph, we achieve a 1.01-11.83x speedup. We are able to compute the $4$-clique counts on the largest publicly-available graph with over two hundred billion edges.","PeriodicalId":93610,"journal":{"name":"Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms. SIAM Conference on Applied and Computational Discrete Algorithms (2021 : Online)","volume":"24 1","pages":"135-146"},"PeriodicalIF":0.0,"publicationDate":"2020-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86082991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}