Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg
{"title":"Accurate Computation of the Logarithm of Modified Bessel Functions on GPUs","authors":"Andreas Plesner, Hans Henrik Brandenborg Sørensen, Søren Hauberg","doi":"arxiv-2409.08729","DOIUrl":null,"url":null,"abstract":"Bessel functions are critical in scientific computing for applications such\nas machine learning, protein structure modeling, and robotics. However,\ncurrently, available routines lack precision or fail for certain input ranges,\nsuch as when the order $v$ is large, and GPU-specific implementations are\nlimited. We address the precision limitations of current numerical\nimplementations while dramatically improving the runtime. We propose two novel\nalgorithms for computing the logarithm of modified Bessel functions of the\nfirst and second kinds by computing intermediate values on a logarithmic scale.\nOur algorithms are robust and never have issues with underflows or overflows\nwhile having relative errors on the order of machine precision, even for inputs\nwhere existing libraries fail. In C++/CUDA, our algorithms have median and\nmaximum speedups of 45x and 6150x for GPU and 17x and 3403x for CPU,\nrespectively, over the ranges of inputs and third-party libraries tested.\nCompared to SciPy, the algorithms have median and maximum speedups of 77x and\n300x for GPU and 35x and 98x for CPU, respectively, over the tested inputs. The ability to robustly compute a solution and the low relative errors allow\nus to fit von Mises-Fisher, vMF, distributions to high-dimensional neural\nnetwork features. This is, e.g., relevant for uncertainty quantification in\nmetric learning. We obtain image feature data by processing CIFAR10 training\nimages with the convolutional layers of a pre-trained ResNet50. We successfully\nfit vMF distributions to 2048-, 8192-, and 32768-dimensional image feature data\nusing our algorithms. Our approach provides fast and accurate results while\nexisting implementations in SciPy and mpmath fail to fit successfully. Our approach is readily implementable on GPUs, and we provide a fast\nopen-source implementation alongside this paper.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"32 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08729","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Bessel functions are critical in scientific computing for applications such
as machine learning, protein structure modeling, and robotics. However,
currently, available routines lack precision or fail for certain input ranges,
such as when the order $v$ is large, and GPU-specific implementations are
limited. We address the precision limitations of current numerical
implementations while dramatically improving the runtime. We propose two novel
algorithms for computing the logarithm of modified Bessel functions of the
first and second kinds by computing intermediate values on a logarithmic scale.
Our algorithms are robust and never have issues with underflows or overflows
while having relative errors on the order of machine precision, even for inputs
where existing libraries fail. In C++/CUDA, our algorithms have median and
maximum speedups of 45x and 6150x for GPU and 17x and 3403x for CPU,
respectively, over the ranges of inputs and third-party libraries tested.
Compared to SciPy, the algorithms have median and maximum speedups of 77x and
300x for GPU and 35x and 98x for CPU, respectively, over the tested inputs. The ability to robustly compute a solution and the low relative errors allow
us to fit von Mises-Fisher, vMF, distributions to high-dimensional neural
network features. This is, e.g., relevant for uncertainty quantification in
metric learning. We obtain image feature data by processing CIFAR10 training
images with the convolutional layers of a pre-trained ResNet50. We successfully
fit vMF distributions to 2048-, 8192-, and 32768-dimensional image feature data
using our algorithms. Our approach provides fast and accurate results while
existing implementations in SciPy and mpmath fail to fit successfully. Our approach is readily implementable on GPUs, and we provide a fast
open-source implementation alongside this paper.