Proceedings of the International Workshop on Parallel Symbolic Computation最新文献

筛选
英文 中文
Meataxe64: High performance linear algebra over finite fields 有限域上的高性能线性代数
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115947
R. Parker
{"title":"Meataxe64: High performance linear algebra over finite fields","authors":"R. Parker","doi":"10.1145/3115936.3115947","DOIUrl":"https://doi.org/10.1145/3115936.3115947","url":null,"abstract":"","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128888216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Algorithm For Spliting Polynomial Systems Based On F4 基于F4的多项式系统分裂算法
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115948
M. Monagan, Roman Pearce
{"title":"An Algorithm For Spliting Polynomial Systems Based On F4","authors":"M. Monagan, Roman Pearce","doi":"10.1145/3115936.3115948","DOIUrl":"https://doi.org/10.1145/3115936.3115948","url":null,"abstract":"We present algorithms for splitting polynomial systems using Gröbner bases. For zero dimensional systems, we use FGLM to compute univariate polynomials and factor them, placing the ideal into general position if necessary. For positive dimensional systems, we successively eliminate variables using F4 and use the leading co-efficients of the last variable to split the system. We also present a known optimization to reduce the cost of zero-reductions in F4, an improvement for FGLM over the rationals, and an algorithm for quickly detecting redundant ideals in a decomposition.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115182205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Fast Parallel Multi-point Evaluation of Sparse Polynomials 稀疏多项式的快速并行多点求值
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115940
M. Monagan, Alan Wong
{"title":"Fast Parallel Multi-point Evaluation of Sparse Polynomials","authors":"M. Monagan, Alan Wong","doi":"10.1145/3115936.3115940","DOIUrl":"https://doi.org/10.1145/3115936.3115940","url":null,"abstract":"We present a parallel algorithm to evaluate a sparse polynomial in Zp[x0, ..., xn] into many bivariate images, based on the fast multi-point evaluation technique described by van der Hoeven and Lecerf [11]. We have implemented the fast parallel algorithm in Cilk C. We present benchmarks demonstrating good parallel speedup for multi-core computers. Our algorithm was developed with a specific application in mind, namely, the sparse polynomial GCD algorithm of Hu and Monagan [6] which requires evaluations of this form. We present benchmarks showing a large speedup for the polynomial GCD problem.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126852562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multithreaded programming on the GPU: pointers and hints for the computer algebraist GPU上的多线程编程:计算机代数的指针和提示
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115939
M. M. Maza
{"title":"Multithreaded programming on the GPU: pointers and hints for the computer algebraist","authors":"M. M. Maza","doi":"10.1145/3115936.3115939","DOIUrl":"https://doi.org/10.1145/3115936.3115939","url":null,"abstract":"It is well-known that the advent of hardware acceleration technologies (multicore processors, graphics processing units, field programmable gate arrays) provide vast opportunities for innovation in computing. In particular, GPUs combined with low-level heterogeneous programming models, such as CUDA (the Compute Unified Device Architecture, see [6, 7]), brought super-computing to the level of the desktop computer. However, these low-level programming models carry notable challenges, even to expert programmers. Indeed, fully exploiting the power of hardware accelerators by writing CUDA code often requires significant code optimization effort. This two-hour tutorial attempts to cover the key principles that computer algebraists interested in GPU programming should have in mind. The first half introduces the basics of GPU architecture and the CUDA programming model: no preliminary experience with GPU programming will be assumed; see [10] for a reference. In the second hour, we shall discuss the recent developments in terms of GPU architecture (e.g. dynamic parallelism [12]) and programming models (e.g. OpenMP [1, 9] and OpenACC [8, 11] as well as techniques for improving code performance (e.g MWP-CWP mode [4], TMM model [5], MCM model [3]). Illustrative examples are taken from the CUMODP library [2] for dense polynomial arithmetic over finite fields.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124463602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Generic Scalable Parallel Combinatorial Search 通用可扩展并行组合搜索
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115942
B. Archibald, Patrick Maier, Robert J. Stewart, P. Trinder, J. Beule
{"title":"Towards Generic Scalable Parallel Combinatorial Search","authors":"B. Archibald, Patrick Maier, Robert J. Stewart, P. Trinder, J. Beule","doi":"10.1145/3115936.3115942","DOIUrl":"https://doi.org/10.1145/3115936.3115942","url":null,"abstract":"Combinatorial search problems in mathematics, e.g. in finite geometry, are notoriously hard; a state-of-the-art backtracking search algorithm can easily take months to solve a single problem. There is clearly demand for parallel combinatorial search algorithms scaling to hundreds of cores and beyond. However, backtracking combinatorial searches are challenging to parallelise due to their sensitivity to search order and due to the their irregularly shaped search trees. Moreover, scaling parallel search to hundreds of cores generally requires highly specialist parallel programming expertise. This paper proposes a generic scalable framework for solving hard combinatorial problems. Key elements are distributed memory task parallelism (to achieve scale), work stealing (to cope with irregularity), and generic algorithmic skeletons for combinatorial search (to reduce the parallelism expertise required). We outline two implementations: a mature Haskell Tree Search Library (HTSL) based around algorithmic skeletons and a prototype C++ Tree Search Library (CTSL) that uses hand coded applications. Experiments on maximum clique problems and on a problem in finite geometry, the search for spreads in H(4, 22), show that (1) CTSL consistently outperforms HTSL on sequential runs, and (2) both libraries scale to 200 cores, e.g. speeding up spreads search by a factor of 81 (HTSL) and 60 (CTSL), respectively. This demonstrates the potential of our generic framework for scaling parallel combinatorial search to large distributed memory platforms.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114938553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Parallel Fast Möbius (Reed-Muller) Transform and its Implementation with CUDA on GPUs 并行快速Möbius (Reed-Muller)变换及其在gpu上的CUDA实现
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115941
D. Bikov, I. Bouyukliev
{"title":"Parallel Fast Möbius (Reed-Muller) Transform and its Implementation with CUDA on GPUs","authors":"D. Bikov, I. Bouyukliev","doi":"10.1145/3115936.3115941","DOIUrl":"https://doi.org/10.1145/3115936.3115941","url":null,"abstract":"One of the most important cryptographic characteristics of the Boolean and vector Boolean functions is the algebraic degree which is connected with the Algebraic Normal Form. In this paper, we present an algorithm for computing the Algebraic Normal Form of a Boolean function using binary Fast Möbius (Reed-Muller) Transform implemented in CUDA for parallel execution on GPU. In the end, we give some experimental results.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131761075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Plain, and Somehow Sparse, Univariate Polynomial Division on Graphics Processing Units 图形处理单元上的单变量多项式除法
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115946
S. A. Haque, A. Hashemi, Davood Mohajerani, M. M. Maza
{"title":"Plain, and Somehow Sparse, Univariate Polynomial Division on Graphics Processing Units","authors":"S. A. Haque, A. Hashemi, Davood Mohajerani, M. M. Maza","doi":"10.1145/3115936.3115946","DOIUrl":"https://doi.org/10.1145/3115936.3115946","url":null,"abstract":"We present multithreaded adaptations of the Euclidean plain division and the Euclidean GCD algorithms to the many-core GPU architectures We report on implementation with NVIDIA CUDA and complexity analysis with an enhanced version of the PRAM model.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115762634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Performance Computing Experiments in Enumerative and Algebraic Combinatorics 枚举与代数组合中的高性能计算实验
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115938
F. Hivert
{"title":"High Performance Computing Experiments in Enumerative and Algebraic Combinatorics","authors":"F. Hivert","doi":"10.1145/3115936.3115938","DOIUrl":"https://doi.org/10.1145/3115936.3115938","url":null,"abstract":"The goal of this abstract is to report on some parallel and high performance computations in combinatorics, each involving large datasets generated recursively: we start by presenting a small framework implemented in Sagemath [12] allowing performance of map/reduce like computations on such recursively defined sets. In the second part, we describe a methodology used to achieve large speedups in several enumeration problems involving similar map/reduced computations. We illustrate this methodology on the challenging problem of counting the number of numerical semigroups [5], and present briefly another problem about enumerating integer vectors upto the action of a permutation group [2]. We believe that these techniques are fairly general for those kinds of algorithms.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116686300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compiler auto-vectorization of matrix multiplication modulo small primes 矩阵乘模小素数的编译器自动向量化
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115943
Matthew A. Lambert, B. D. Saunders
{"title":"Compiler auto-vectorization of matrix multiplication modulo small primes","authors":"Matthew A. Lambert, B. D. Saunders","doi":"10.1145/3115936.3115943","DOIUrl":"https://doi.org/10.1145/3115936.3115943","url":null,"abstract":"Modern CPUs have vector instruction sets such as SSE2 and AVX2 which support the bit level operations (and, or, xor, etc. ) as well as floating point and integer arithmetic. Furthermore compilers, such as g++ and Clang, have auto-vectorization features to exploit the vector instructions. In this study we take advantage of these tools to improve performance of matrix multiplication over GF2, GF3, and other small fields. The purpose is to enhance performance of the Four Russians matrix multiplication algorithm, providing an efficient base case for multiplication of larger matrices using block decomposition as in Strassen's method. The essence of this environment is that already word level parallelism exists, since multiple field elements are stuffed into a word. The hardware vector operations further enhance the needed vector operations of addition and scaling by small powers of 2. Arithmetic modulo 2 or 3 is achieved via bit level operations. For other small fields the packing scheme is such that the vector addition and scaling operations must be followed by periodic normalization. We obtain approximately 2 to 3 fold speedup over these arithmetics on 64 bit words by coaxing compiler exploitation of the 256-bit SIMD instructions.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116737048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallel Sparse PLUQ Factorization modulo p 并行稀疏PLUQ分解模
Proceedings of the International Workshop on Parallel Symbolic Computation Pub Date : 2017-07-23 DOI: 10.1145/3115936.3115944
Charles Bouillaguet, Claire Delaplace, Marie-Emilie Voge
{"title":"Parallel Sparse PLUQ Factorization modulo p","authors":"Charles Bouillaguet, Claire Delaplace, Marie-Emilie Voge","doi":"10.1145/3115936.3115944","DOIUrl":"https://doi.org/10.1145/3115936.3115944","url":null,"abstract":"In this paper, we present the results of our experiments to compute the rank of several large sparse matrices from Dumas's Sparse Integer Matrix Collection, by computing sparse PLUQ factorizations. Our approach consists in identifying as many pivots as possible before performing any arithmetic operation, based solely on the location of non-zero entries in the input matrix. These \"structural\" pivots are then all eliminated in parallel, in a single pass. We describe several heuristic structural pivot selection algorithms (the problem is NP-hard). These algorithms allows us to compute the ranks of several large sparse matrices in a few minutes, versus many days using Wiedemann's algorithm. Lastly, we describe a multi-thread implementation using OpenMP achieving 70% parallel efficiency on 24 cores on the largest benchmark.","PeriodicalId":102463,"journal":{"name":"Proceedings of the International Workshop on Parallel Symbolic Computation","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114721221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信