{"title":"Conversion of Boolean and Integer FlatZinc Builtins to Quadratic or Linear Integer Problems","authors":"Armin Wolf","doi":"arxiv-2404.12797","DOIUrl":"https://doi.org/arxiv-2404.12797","url":null,"abstract":"Constraint satisfaction or optimisation models -- even if they are formulated\u0000in high-level modelling languages -- need to be reduced into an equivalent\u0000format before they can be solved by the use of Quantum Computing. In this paper\u0000we show how Boolean and integer FlatZinc builtins over finite-domain integer\u0000variables can be equivalently reformulated as linear equations, linear\u0000inequalities or binary products of those variables, i.e. as finite-domain\u0000quadratic integer programs. Those quadratic integer programs can be further\u0000transformed into equivalent Quadratic Unconstrained Binary Optimisation problem\u0000models, i.e. a general format for optimisation problems to be solved on Quantum\u0000Computers especially on Quantum Annealers.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness and Accuracy in Pipelined Bi-Conjugate Gradient Stabilized Method: A Comparative Study","authors":"Mykhailo Havdiak, Jose I. Aliaga, Roman Iakymchuk","doi":"arxiv-2404.13216","DOIUrl":"https://doi.org/arxiv-2404.13216","url":null,"abstract":"In this article, we propose an accuracy-assuring technique for finding a\u0000solution for unsymmetric linear systems. Such problems are related to different\u0000areas such as image processing, computer vision, and computational fluid\u0000dynamics. Parallel implementation of Krylov subspace methods speeds up finding\u0000approximate solutions for linear systems. In this context, the refined approach\u0000in pipelined BiCGStab enhances scalability on distributed memory machines,\u0000yielding to substantial speed improvements compared to the standard BiCGStab\u0000method. However, it's worth noting that the pipelined BiCGStab algorithm\u0000sacrifices some accuracy, which is stabilized with the residual replacement\u0000technique. This paper aims to address this issue by employing the ExBLAS-based\u0000reproducible approach. We validate the idea on a set of matrices from the\u0000SuiteSparse Matrix Collection.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck
{"title":"GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems","authors":"Daniel Kempf, Marius Kurz, Marcel Blind, Patrick Kopper, Philipp Offenhäuser, Anna Schwarz, Spencer Starr, Jens Keim, Andrea Beck","doi":"arxiv-2404.12703","DOIUrl":"https://doi.org/arxiv-2404.12703","url":null,"abstract":"This work presents GAL{AE}XI as a novel, energy-efficient flow solver for\u0000the simulation of compressible flows on unstructured meshes leveraging the\u0000parallel computing power of modern Graphics Processing Units (GPUs). GAL{AE}XI\u0000implements the high-order Discontinuous Galerkin Spectral Element Method\u0000(DGSEM) using shock capturing with a finite-volume subcell approach to ensure\u0000the stability of the high-order scheme near shocks. This work provides details\u0000on the general code design, the parallelization strategy, and the\u0000implementation approach for the compute kernels with a focus on the element\u0000local mappings between volume and surface data due to the unstructured mesh.\u0000GAL{AE}XI exhibits excellent strong scaling properties up to 1024 GPUs if each\u0000GPU is assigned a minimum of one million degrees of freedom degrees of freedom.\u0000To verify its implementation, a convergence study is performed that recovers\u0000the theoretical order of convergence of the implemented numerical schemes.\u0000Moreover, the solver is validated using both the incompressible and\u0000compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and\u00001.25, respectively. A mesh convergence study shows that the results converge to\u0000the high-fidelity reference solution and that the results match the original\u0000CPU implementation. Finally, GAL{AE}XI is applied to a large-scale\u0000wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37.\u0000Here, the supersonic region and shocks at the leading edge are captured\u0000accurately and robustly by the implemented shock-capturing approach. It is\u0000demonstrated that GAL{AE}XI requires less than half of the energy to carry out\u0000this simulation in comparison to the reference CPU implementation. This renders\u0000GAL{AE}XI as a potent tool for accurate and efficient simulations of\u0000compressible flows in the realm of exascale computing and the associated new\u0000HPC architectures.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Confirmable Workflows in OSCAR","authors":"Michael Joswig, Lars Kastner, Benjamin Lorenz","doi":"arxiv-2404.06241","DOIUrl":"https://doi.org/arxiv-2404.06241","url":null,"abstract":"We discuss what is special about the reproducibility of workflows in computer\u0000algebra. It is emphasized how the programming language Julia and the new\u0000computer algebra system OSCAR support such a reproducibility, and how users can\u0000benefit for their own work.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers","authors":"Paul Scheffler, Luca Colagrande, Luca Benini","doi":"arxiv-2404.05303","DOIUrl":"https://doi.org/arxiv-2404.05303","url":null,"abstract":"Stencil codes are performance-critical in many compute-intensive\u0000applications, but suffer from significant address calculation and irregular\u0000memory access overheads. This work presents SARIS, a general and highly\u0000flexible methodology for stencil acceleration using register-mapped indirect\u0000streams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V\u0000compute cluster with indirect stream registers, achieving significant speedups\u0000of 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency\u0000improvements of 1.58x over an RV32G baseline on average. Scaling out to a\u0000256-core manycore system, we estimate an average FPU utilization of 64%, an\u0000average speedup of 2.14x, and up to 15% higher fractions of peak compute than a\u0000leading GPU code generator.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"29 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Walther NeuperJKU - Johannes Kepler Universität Linz
{"title":"Interactive Formal Specification for Mathematical Problems of Engineers","authors":"Walther NeuperJKU - Johannes Kepler Universität Linz","doi":"arxiv-2404.05462","DOIUrl":"https://doi.org/arxiv-2404.05462","url":null,"abstract":"The paper presents the second part of a precise description of the prototype\u0000that has been developed in the course of the ISAC project over the last two\u0000decades. This part describes the \"specify-phase\", while the first part\u0000describing the \"solve-phase\" is already published. In the specify-phase a student interactively constructs a formal\u0000specification. The ISAC prototype implements formal specifications as\u0000established in theoretical computer science, however, the input language for\u0000the construction avoids requiring users to have knowledge of logic; this makes\u0000the system useful for various engineering faculties (and also for high school). The paper discusses not only ISAC's design of the specify-phase in detail,\u0000but also gives a brief introduction to implementation with the aim of\u0000advertising the re-use of formal frameworks (inclusive respective front-ends)\u0000with their generic tools for language definition and their rich pool of\u0000software components for formal mathematics.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predefined Software Environment Runtimes As A Measure For Reproducibility","authors":"Aaruni Kaushik","doi":"arxiv-2404.05563","DOIUrl":"https://doi.org/arxiv-2404.05563","url":null,"abstract":"As part of Mathematical Research Data Initiative (MaRDI), we have developed a\u0000way to preserve a software package into an easy to deploy and use sandbox\u0000environment we call a \"runtime\", via a program we developed called MaPS : MaRDI\u0000Packaging System. The program relies on Linux user namespaces to isolate a\u0000library environment from the host system, making the sandboxed software\u0000reproducible on other systems, with minimal effort. Moreover an overlay\u0000filesystem makes local edits persistent. This project will aid reproducibility\u0000efforts of research papers: both mathematical and from other disciplines. As a\u0000proof of concept, we provide runtimes for the OSCAR Computer Algebra System,\u0000polymake software for research in polyhedral geometry, and VIBRANT Virus\u0000Identification By iteRative ANnoTation. The software is in a prerelease state:\u0000the interface for creating, deploying, and executing runtimes is final, and an\u0000interface for easily publishing runtimes is under active development. We thus\u0000propose publishing predefined, distributable software environment runtimes\u0000along with research papers in an effort to make research with software based\u0000results reproducible.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser
{"title":"A shared compilation stack for distributed-memory parallelism in stencil DSLs","authors":"George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser","doi":"arxiv-2404.02218","DOIUrl":"https://doi.org/arxiv-2404.02218","url":null,"abstract":"Domain Specific Languages (DSLs) increase programmer productivity and provide\u0000high performance. Their targeted abstractions allow scientists to express\u0000problems at a high level, providing rich details that optimizing compilers can\u0000exploit to target current- and next-generation supercomputers. The convenience\u0000and performance of DSLs come with significant development and maintenance\u0000costs. The siloed design of DSL compilers and the resulting inability to\u0000benefit from shared infrastructure cause uncertainties around longevity and the\u0000adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler\u0000framework to HPC, we bring the same synergies that the machine learning\u0000community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the\u0000finite-difference stencil HPC community. We introduce new HPC-specific\u0000abstractions for message passing targeting distributed stencil computations. We\u0000demonstrate the sharing of common components across three distinct HPC\u0000stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing\u0000that our framework generates high-performance executables based upon a shared\u0000compiler ecosystem.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucas M. Dutton, Christopher Kumar Anand, Robert Enenkel, Silvia Melitta Müller
{"title":"Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root","authors":"Lucas M. Dutton, Christopher Kumar Anand, Robert Enenkel, Silvia Melitta Müller","doi":"arxiv-2404.00387","DOIUrl":"https://doi.org/arxiv-2404.00387","url":null,"abstract":"Floating-point arithmetic performance determines the overall performance of\u0000important applications, from graphics to AI. Meeting the IEEE-754 specification\u0000for floating-point requires that final results of addition, subtraction,\u0000multiplication, division, and square root are correctly rounded based on the\u0000user-selected rounding mode. A frustrating fact for implementers is that naive\u0000rounding methods will not produce correctly rounded results even when\u0000intermediate results with greater accuracy and precision are available. In\u0000contrast, our novel algorithm can correct approximations of reciprocal,\u0000division and square root, even ones with slightly lower than target precision.\u0000In this paper, we present a family of algorithms that can both increase the\u0000accuracy (and potentially the precision) of an estimate and correctly round it\u0000according to all binary IEEE-754 rounding modes. We explain how it may be\u0000efficiently implemented in hardware, and for completeness, we present proofs\u0000that it is not necessary to include equality tests associated with\u0000round-to-nearest-even mode for reciprocal, division and square root functions,\u0000because it is impossible for input(s) in a given precision to have exact\u0000answers exactly midway between representable floating-point numbers in that\u0000precision. In fact, our simpler proofs are sometimes stronger.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140560307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Lie Group Approach to Riemannian Batch Normalization","authors":"Ziheng Chen, Yue Song, Yunmei Liu, Nicu Sebe","doi":"arxiv-2403.11261","DOIUrl":"https://doi.org/arxiv-2403.11261","url":null,"abstract":"Manifold-valued measurements exist in numerous applications within computer\u0000vision and machine learning. Recent studies have extended Deep Neural Networks\u0000(DNNs) to manifolds, and concomitantly, normalization techniques have also been\u0000adapted to several manifolds, referred to as Riemannian normalization.\u0000Nonetheless, most of the existing Riemannian normalization methods have been\u0000derived in an ad hoc manner and only apply to specific manifolds. This paper\u0000establishes a unified framework for Riemannian Batch Normalization (RBN)\u0000techniques on Lie groups. Our framework offers the theoretical guarantee of\u0000controlling both the Riemannian mean and variance. Empirically, we focus on\u0000Symmetric Positive Definite (SPD) manifolds, which possess three distinct types\u0000of Lie group structures. Using the deformation concept, we generalize the\u0000existing Lie groups on SPD manifolds into three families of parameterized Lie\u0000groups. Specific normalization layers induced by these Lie groups are then\u0000proposed for SPD neural networks. We demonstrate the effectiveness of our\u0000approach through three sets of experiments: radar recognition, human action\u0000recognition, and electroencephalography (EEG) classification. The code is\u0000available at https://github.com/GitZH-Chen/LieBN.git.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}