Doru Thom Popovici, Mauro del Ben, Osni Marques, Andrew Canning
{"title":"Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes","authors":"Doru Thom Popovici, Mauro del Ben, Osni Marques, Andrew Canning","doi":"arxiv-2406.05577","DOIUrl":"https://doi.org/arxiv-2406.05577","url":null,"abstract":"Multi-dimensional Fourier transforms are key mathematical building blocks\u0000that appear in a wide range of applications from materials science, physics,\u0000chemistry and even machine learning. Over the past years, a multitude of\u0000software packages targeting distributed multi-dimensional Fourier transforms\u0000have been developed. Most variants attempt to offer efficient implementations\u0000for single transforms applied on data mapped onto rectangular grids. However,\u0000not all scientific applications conform to this pattern, i.e. plane wave\u0000Density Functional Theory codes require multi-dimensional Fourier transforms\u0000applied on data represented as batches of spheres. Typically, the\u0000implementations for this use case are hand-coded and tailored for the\u0000requirements of each application. In this work, we present the Fastest Fourier\u0000Transform from Berkeley (FFTB) a distributed framework that offers flexible\u0000implementations for both regular/non-regular data grids and batched/non-batched\u0000transforms. We provide a flexible implementations with a user-friendly API that\u0000captures most of the use cases. Furthermore, we provide implementations for\u0000both CPU and GPU platforms, showing that our approach offers improved execution\u0000time and scalability on the HP Cray EX supercomputer. In addition, we outline\u0000the need for flexible implementations for different use cases of the software\u0000package.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"svds-C: A Multi-Thread C Code for Computing Truncated Singular Value Decomposition","authors":"Xu Feng, Wenjian Yu, Yuyang Xie","doi":"arxiv-2405.18966","DOIUrl":"https://doi.org/arxiv-2405.18966","url":null,"abstract":"This article presents svds-C, an open-source and high-performance C program\u0000for accurately and robustly computing truncated SVD, e.g. computing several\u0000largest singular values and corresponding singular vectors. We have\u0000re-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLAS\u0000and multi-thread computing to obtain the parallel program named svds-C. svds-C\u0000running on shared-memory computer consumes less time and memory than svds\u0000thanks to careful implementation of multi-thread parallelization and memory\u0000management. Numerical experiments on different test cases which are\u0000synthetically generated or directly from real world datasets show that, svds-C\u0000runs remarkably faster than svds with averagely 4.7X and at most 12X speedup\u0000for 16-thread parallel computing on a computer with Intel CPU, while preserving\u0000same accuracy and consuming about half memory space. Experimental results also\u0000demonstrate that svds-C has similar advantages over svds on the computer with\u0000AMD CPU, and outperforms other state-of-the-art algorithms for truncated SVD on\u0000computing time and robustness.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zachary J. Wegert, Jordi Manyer, Connor Mallon, Santiago Badia, Vivien J. Challis
{"title":"GridapTopOpt.jl: A scalable Julia toolbox for level set-based topology optimisation","authors":"Zachary J. Wegert, Jordi Manyer, Connor Mallon, Santiago Badia, Vivien J. Challis","doi":"arxiv-2405.10478","DOIUrl":"https://doi.org/arxiv-2405.10478","url":null,"abstract":"In this paper we present GridapTopOpt, an extendable framework for level\u0000set-based topology optimisation that can be readily distributed across a\u0000personal computer or high-performance computing cluster. The package is written\u0000in Julia and uses the Gridap package ecosystem for parallel finite element\u0000assembly from arbitrary weak formulations of partial differential equation\u0000(PDEs) along with the scalable solvers from the Portable and Extendable Toolkit\u0000for Scientific Computing (PETSc). The resulting user interface is intuitive and\u0000easy-to-use, allowing for the implementation of a wide range of topology\u0000optimisation problems with a syntax that is near one-to-one with the\u0000mathematical notation. Furthermore, we implement automatic differentiation to\u0000help mitigate the bottleneck associated with the analytic derivation of\u0000sensitivities for complex problems. GridapTopOpt is capable of solving a range\u0000of benchmark and research topology optimisation problems with large numbers of\u0000degrees of freedom. This educational article demonstrates the usability and\u0000versatility of the package by describing the formulation and step-by-step\u0000implementation of several distinct topology optimisation problems. The driver\u0000scripts for these problems are provided and the package source code is\u0000available at https://github$.$com/zjwegert/GridapTopOpt.jl.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PyOptInterface: Design and implementation of an efficient modeling language for mathematical optimization","authors":"Yue Yang, Chenhui Lin, Luo Xu, Wenchuan Wu","doi":"arxiv-2405.10130","DOIUrl":"https://doi.org/arxiv-2405.10130","url":null,"abstract":"This paper introduces the design and implementation of PyOptInterface, a\u0000modeling language for mathematical optimization embedded in Python programming\u0000language. PyOptInterface uses lightweight and compact data structure to bridge\u0000high-level entities in optimization models like variables and constraints to\u0000internal indices of optimizers efficiently. It supports a variety of\u0000optimization solvers and a range of common problem classes. We provide\u0000benchmarks to exhibit the competitive performance of PyOptInterface compared\u0000with other state-of-the-art modeling languages.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141060914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Local Adjoints for Simultaneous Preaccumulations with Shared Inputs","authors":"Johannes Blühdorn, Nicolas R. Gauger","doi":"arxiv-2405.07819","DOIUrl":"https://doi.org/arxiv-2405.07819","url":null,"abstract":"In shared-memory parallel automatic differentiation, shared inputs among\u0000simultaneous thread-local preaccumulations lead to data races if Jacobians are\u0000accumulated with a single, shared vector of adjoint variables. In this work, we\u0000discuss the benefits and tradeoffs of re-enabling such preaccumulations by a\u0000transition to suitable local adjoint variables. In particular, we assess the\u0000performance of mapped local adjoints in discrete adjoint computations in the\u0000multiphysics simulation suite SU2.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger
{"title":"Hybrid Parallel Discrete Adjoints in SU2","authors":"Johannes Blühdorn, Pedro Gomes, Max Aehle, Nicolas R. Gauger","doi":"arxiv-2405.06056","DOIUrl":"https://doi.org/arxiv-2405.06056","url":null,"abstract":"The open-source multiphysics suite SU2 features discrete adjoints by means of\u0000operator overloading automatic differentiation (AD). While both primal and\u0000discrete adjoint solvers support MPI parallelism, hybrid parallelism using both\u0000MPI and OpenMP has only been introduced for the primal solvers so far. In this\u0000work, we enable hybrid parallel discrete adjoint solvers. Coupling SU2 with\u0000OpDiLib, an add-on for operator overloading AD tools that extends AD to OpenMP\u0000parallelism, marks a key step in this endeavour. We identify the affected parts\u0000of SU2's advanced AD workflow and discuss the required changes and their\u0000tradeoffs. Detailed performance studies compare MPI parallel and hybrid\u0000parallel discrete adjoints in terms of memory and runtime and unveil key\u0000performance characteristics. We showcase the effectiveness of performance\u0000optimizations and highlight perspectives for future improvements. At the same\u0000time, this study demonstrates the applicability of OpDiLib in a large code base\u0000and its scalability on large test cases, providing valuable insights for future\u0000applications both within and beyond SU2.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"208 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tugba Torun, Eren Yenigul, Ameer Taweel, Didem Unat
{"title":"A Sparse Tensor Generator with Efficient Feature Extraction","authors":"Tugba Torun, Eren Yenigul, Ameer Taweel, Didem Unat","doi":"arxiv-2405.04944","DOIUrl":"https://doi.org/arxiv-2405.04944","url":null,"abstract":"Sparse tensor operations are gaining attention in emerging applications such\u0000as social networks, deep learning, diagnosis, crime, and review analysis.\u0000However, a major obstacle for research in sparse tensor operations is the\u0000deficiency of a broad-scale sparse tensor dataset. Another challenge in sparse\u0000tensor operations is examining the sparse tensor features, which are not only\u0000important for revealing its nonzero pattern but also have a significant impact\u0000on determining the best-suited storage format, the decomposition algorithm, and\u0000the reordering methods. However, due to the large sizes of real tensors, even\u0000extracting these features becomes costly without caution. To address these gaps\u0000in the literature, we have developed a smart sparse tensor generator that\u0000mimics the substantial features of real sparse tensors. Moreover, we propose\u0000various methods for efficiently extracting an extensive set of features for\u0000sparse tensors. The effectiveness of our generator is validated through the\u0000quality of features and the performance of decomposition in the generated\u0000tensors. Both the sparse tensor feature extractor and the tensor generator are\u0000open source with all the artifacts available at\u0000https://github.com/sparcityeu/feaTen and https://github.com/sparcityeu/genTen,\u0000respectively.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140937567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance of H-Matrix-Vector Multiplication with Floating Point Compression","authors":"Ronald Kriemann","doi":"arxiv-2405.03456","DOIUrl":"https://doi.org/arxiv-2405.03456","url":null,"abstract":"Matrix-vector multiplication forms the basis of many iterative solution\u0000algorithms and as such is an important algorithm also for hierarchical\u0000matrices. However, due to its low computational intensity, its performance is\u0000typically limited by the available memory bandwidth. By optimizing the storage\u0000representation of the data within such matrices, this limitation can be lifted\u0000and the performance increased. This applies not only to hierarchical matrices\u0000but for also for other low-rank approximation schemes, e.g. block low-rank\u0000matrices.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140883052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimization of Nonlinear Energies in Python Using FEM and Automatic Differentiation Tools","authors":"Michal Béreš, Jan Valdman","doi":"arxiv-2407.04706","DOIUrl":"https://doi.org/arxiv-2407.04706","url":null,"abstract":"This contribution examines the capabilities of the Python ecosystem to solve\u0000nonlinear energy minimization problems, with a particular focus on\u0000transitioning from traditional MATLAB methods to Python's advanced\u0000computational tools, such as automatic differentiation. We demonstrate Python's\u0000streamlined approach to minimizing nonlinear energies by analyzing three\u0000problem benchmarks - the p-Laplacian, the Ginzburg-Landau model, and the\u0000Neo-Hookean hyperelasticity. This approach merely requires the provision of the\u0000energy functional itself, making it a simple and efficient way to solve this\u0000category of problems. The results show that the implementation is about ten\u0000times faster than the MATLAB implementation for large-scale problems. Our\u0000findings highlight Python's efficiency and ease of use in scientific computing,\u0000establishing it as a preferable choice for implementing sophisticated\u0000mathematical models and accelerating the development of numerical simulations.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141571851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finch: Sparse and Structured Array Programming with Control Flow","authors":"Willow Ahrens, Teodoro Fields Collin, Radha Patel, Kyle Deeds, Changwan Hong, Saman Amarasinghe","doi":"arxiv-2404.16730","DOIUrl":"https://doi.org/arxiv-2404.16730","url":null,"abstract":"From FORTRAN to NumPy, arrays have revolutionized how we express computation.\u0000However, arrays in these, and almost all prominent systems, can only handle\u0000dense rectilinear integer grids. Real world arrays often contain underlying\u0000structure, such as sparsity, runs of repeated values, or symmetry. Support for\u0000structured data is fragmented and incomplete. Existing frameworks limit the\u0000array structures and program control flow they support to better simplify the\u0000problem. In this work, we propose a new programming language, Finch, which supports\u0000both flexible control flow and diverse data structures. Finch facilitates a\u0000programming model which resolves the challenges of computing over structured\u0000arrays by combining control flow and data structures into a common\u0000representation where they can be co-optimized. Finch automatically specializes\u0000control flow to data so that performance engineers can focus on experimenting\u0000with many algorithms. Finch supports a familiar programming language of loops,\u0000statements, ifs, breaks, etc., over a wide variety of array structures, such as\u0000sparsity, run-length-encoding, symmetry, triangles, padding, or blocks. Finch\u0000reliably utilizes the key properties of structure, such as structural zeros,\u0000repeated values, or clustered non-zeros. We show that this leads to dramatic\u0000speedups in operations such as SpMV and SpGEMM, image processing, graph\u0000analytics, and a high-level tensor operator fusion interface.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140800855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}