{"title":"A compact parallel implementation of F4","authors":"M. Monagan, Roman Pearce","doi":"10.1145/2790282.2790293","DOIUrl":"https://doi.org/10.1145/2790282.2790293","url":null,"abstract":"We present a compact and parallel C implementation of the F4 algorithm for computing Gröbner bases which uses Cilk. We give an easy way to parallelize the sparse linear algebra which is the main cost in practice. To obtain more speedup we attempted to parallelize the generation of sparse matrices as well. We present timings to assess the effectiveness of our approach and to compare our implementation to others.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125976994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Direct solution of the (11,9,8)-MinRank problem by the block Wiedemann algorithm in magma with a tesla GPU","authors":"A. Steel","doi":"10.1145/2790282.2791392","DOIUrl":"https://doi.org/10.1145/2790282.2791392","url":null,"abstract":"We show how some very large multivariate polynomial systems over finite fields can be solved by Gröbner basis techniques coupled with the Block Wiedemann algorithm, thus extending the Wiedemann-based 'Sparse FGLM' approach of Faugère and Mou. The main components of our approach are a dense variant of the Faugère F4 Gröbner basis algorithm and the Block Wiedemann algorithm, which have been implemented within the Magma Computer Algebra System (released in version V2.20 in late 2014). A major feature of the algorithms is that they map much of the computation to dense matrix multiplication, and this allows dramatic speedups to be achieved for large examples when an Nvidia Tesla GPU is available. As a result, the Magma implementation can directly solve a 16-bit random instance of the Courtois (11,9,8)-MinRank Challenge C in about 15.1 hours with a single Intel Sandybridge CPU core coupled with an Nvidia Tesla K40 GPU.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122614958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Gautier, Jean-Louis Roch, Ziad Sultan, Bastien Vialla
{"title":"Parallel algebraic linear algebra dedicated interface","authors":"T. Gautier, Jean-Louis Roch, Ziad Sultan, Bastien Vialla","doi":"10.1145/2790282.2790286","DOIUrl":"https://doi.org/10.1145/2790282.2790286","url":null,"abstract":"This work deals with parallelism in linear algebra routines. We propose a domain specific language based on C/C++ macros, PALADIn (Parallel Algebraic Linear Algebra Dedicated Interface). This domain specific language allows the user to write C++ code and benefit from sequential and parallel executions on shared memory architectures. With a unique syntax, the user can switch between different parallel runtime systems such as OpenMP, TBB and xKaapi. This interface provides data and task parallelism. Depending on the runtime system, task parallelism can use explicit synchronizations or data-dependency based synchronizations. Also, this language provides different matrix cutting strategies according to one or two dimensions. Moreover, block algorithms, such as block iterative and recursive matrix multiplication, can involve splitting according to three dimensions. The latter is also a feature that is provided to the user. The PALADIn interface can be used in any C++ library for linear algebra computation and gets the best performance from the three supported parallel runtime systems.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131929414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid symbolic-numeric approach to exceptional sets of generically zero-dimensional systems","authors":"J. Hauenstein, Alan C. Liddell","doi":"10.1145/2790282.2790288","DOIUrl":"https://doi.org/10.1145/2790282.2790288","url":null,"abstract":"Exceptional sets are the sets where the dimension of the fiber of a map is larger than the generic fiber dimension, which we assume is zero. Such situations naturally arise in kinematics, for example, when designing a mechanism that moves when the generic case is rigid. In 2008, Sommese and Wampler showed that one can use fiber products to promote such sets to become irreducible components. We propose an alternative approach using rank constraints on Macaulay matrices. Symbolic computations are used to construct the proper Macaulay matrices, while numerical computations are used to solve the rank-constraint problem. Various exceptional sets are computed, including exceptional RR dyads, lines on surfaces in C3, and exceptional planar pentads.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel implementation for polynomial multiplication modulo a prime","authors":"M. Law, M. Monagan","doi":"10.1145/2790282.2790291","DOIUrl":"https://doi.org/10.1145/2790282.2790291","url":null,"abstract":"We present a parallel implementation in Cilk C of a modular algorithm for multiplying two polynomials in Zq[x] for integer q > 1, for multi-core computers. Our algorithm uses Chinese remaindering. It multiplies modulo primes p1, p2, ... in parallel and uses a parallel FFT for each prime. Our software multiplies two polynomials of degree 109 modulo a 32 bit integer q in 83 seconds on a 20 core computer.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126501816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel sparse multivariate polynomial division","authors":"M. Gastineau, J. Laskar","doi":"10.1145/2790282.2790285","DOIUrl":"https://doi.org/10.1145/2790282.2790285","url":null,"abstract":"We present a scalable algorithm for dividing two sparse multivariate polynomials represented in a distributed format on shared memory multicore computers. The scalability on the large number of cores is ensured by the lack of synchronizations during the main parallel step. The merge and sorting operations are based on binary heap or tree data structures.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"50 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122423310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High performance implementation of the inverse TFT","authors":"Lingchuan Meng, Jeremy R. Johnson","doi":"10.1145/2790282.2790292","DOIUrl":"https://doi.org/10.1145/2790282.2790292","url":null,"abstract":"The inverse truncated Fourier transform (ITFT) is a key component in the fast polynomial and large integer algorithms introduced by van der Hoeven. This paper reports a high performance implementation of the ITFT which poses additional challenges compared to that of the forward transform. A general-radix variant of the ITFT algorithm is developed to allow the implementation to automatically adapt to the memory hierarchy. Then a parallel ITFT algorithm is developed that trades off small arithmetic cost for full vectorization and improved multi-threaded parallelism. The algorithms are automatically generated and tuned to produce an arbitrary-size ITFT library. The new algorithms and the implementation smooths out the staircase performance associated with power-of-two modular FFT implementations, and provide significant performance improvement over zero-padding approaches even when high-performance FFT libraries are used.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134226862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing and parallelizing the modular GCD algorithm","authors":"Matthew Gibson, M. Monagan","doi":"10.1145/2790282.2790287","DOIUrl":"https://doi.org/10.1145/2790282.2790287","url":null,"abstract":"Our goal is to design and implement a high performance modular GCD algorithm for polynomial GCD computation in Zp[x1, x2, ..., xn] for multi-core computers which will be used to compute the GCD of polynomials over Z. For n = 2 we have designed and implemented in C a highly optimized serial code for primes p < 263. For n > 2 we parallelized in Cilk C Brown's dense modular GCD algorithm using our serial bivariate code at the base. For n = 3, we obtain good parallel speedup on multi-core computers with 16 and 20 cores. We also compare our code with the GCD codes in Maple and Magma.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132155739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dereje Kifle Boku, C. Fieker, W. Decker, Andreas Steenpaß
{"title":"Gröbner bases over algebraic number fields","authors":"Dereje Kifle Boku, C. Fieker, W. Decker, Andreas Steenpaß","doi":"10.1145/2790282.2790284","DOIUrl":"https://doi.org/10.1145/2790282.2790284","url":null,"abstract":"Although Buchberger's algorithm, in theory, allows us to compute Gröbner bases over any field, in practice, however, the computational efficiency depends on the arithmetic of the ground field. Consider a field K = Q(α), a simple extension of Q, where α is an algebraic number, and let f ∈ Q[t] be the minimal polynomial of α. In this paper we present a new efficient method to compute Gröbner bases in polynomial rings over the algebraic number field K. Starting from the ideas of Noro [11], we proceed by joining f to the ideal to be considered, adding t as an extra variable. But instead of avoiding superfluous S-pair reductions by inverting algebraic numbers, we achieve the same goal by applying modular methods as in [2, 3, 10], that is, by inferring information in characteristic zero from information in characteristic p > 0. For suitable primes p, the minimal polynomial f is reducible over Fp. This allows us to apply modular methods once again, on a second level, with respect to the factors of f. The algorithm thus resembles a divide and conquer strategy and is in particular easily parallelizable. At current state, the algorithm is probabilistic in the sense that, as for other modular Gröbner basis computations, an effective final verification test is only known for homogeneous ideals or for local monomial orderings. The presented timings show that for most examples, our algorithm, which has been implemented in Singular [7], outperforms other known methods by far.","PeriodicalId":384227,"journal":{"name":"Proceedings of the 2015 International Workshop on Parallel Symbolic Computation","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116844111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}