{"title":"Generative diffusion models in infinite dimensions: a survey.","authors":"Giulio Franzese, Pietro Michiardi","doi":"10.1098/rsta.2024.0322","DOIUrl":"10.1098/rsta.2024.0322","url":null,"abstract":"<p><p>Diffusion models have recently emerged as a powerful class of generative models, achieving state-of-the-art performance in various domains such as image and audio synthesis. While most existing work focuses on finite-dimensional data, there is growing interest in extending diffusion models to infinite-dimensional function spaces. This survey provides a comprehensive overview of the theoretical foundations and practical applications of diffusion models in infinite dimensions. We review the necessary background on stochastic differential equations in Hilbert spaces, and then discuss different approaches to define generative models rooted in such formalism. Finally, we survey recent applications of infinite-dimensional diffusion models in areas such as generative modelling for function spaces, conditional generation of functional data and solving inverse problems. Throughout the survey, we highlight the connections between different approaches and discuss open problems and future research directions.This article is part of the theme issue 'Generative modelling meets Bayesian inference: a new paradigm for inverse problems'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2299","pages":"20240322"},"PeriodicalIF":4.3,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12201592/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144326522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inverse evolution data augmentation for neural PDE solvers.","authors":"Chaoyu Liu, Chris Budd, Carola-Bibiane Schönlieb","doi":"10.1098/rsta.2024.0242","DOIUrl":"https://doi.org/10.1098/rsta.2024.0242","url":null,"abstract":"<p><p>Neural networks have emerged as promising tools for solving partial differential equations (PDEs), particularly through the application of neural operators. Training neural operators typically requires a large amount of training data to ensure accuracy and generalization. In this article, we propose a novel data augmentation method specifically designed for training neural operators on evolution equations. Our approach utilizes insights from inverse processes of these equations to efficiently generate data from random initialization that are combined with original data. To further enhance the accuracy of the augmented data, we introduce high-order inverse evolution schemes. These schemes consist of only a few explicit computation steps, yet the resulting data pairs can be proven to satisfy the corresponding implicit numerical schemes. In contrast to traditional PDE solvers that require small time steps or implicit schemes to guarantee accuracy, our data augmentation method employs explicit schemes with relatively large time steps, thereby significantly reducing computational costs. Accuracy and efficacy experiments confirm the effectiveness of our approach. In addition, we validate our approach through experiments with the Fourier neural operator (FNO) and UNet on three common evolution equations: Burgers' equation, the Allen-Cahn equation and the Navier-Stokes equation. The results demonstrate a significant improvement in the performance and robustness of the FNO when coupled with our inverse evolution data augmentation method.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240242"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Isotropic Q-fractional Brownian motion on the sphere: regularity and fast simulation.","authors":"Annika Lang, Björn Müller","doi":"10.1098/rsta.2024.0238","DOIUrl":"10.1098/rsta.2024.0238","url":null,"abstract":"<p><p>As an extension of isotropic Gaussian random fields and [Formula: see text]-Wiener processes on [Formula: see text]-dimensional spheres, isotropic [Formula: see text]-fractional Brownian motion is introduced and sample Hölder regularity in space-time is shown depending on the regularity of the spatial covariance operator [Formula: see text] and the Hurst parameter [Formula: see text]. The processes are approximated by a spectral method in space for which strong and almost sure convergence are shown. The underlying sample paths of fractional Brownian motion are simulated by circulant embedding or conditionalized random midpoint displacement. Temporal accuracy and computational complexity are numerically tested, the latter matching the complexity of simulating a [Formula: see text]-Wiener process if allowing for a temporal error.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240238"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12139523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Duong, Nicolaj Rux, Viktor Stein, Gabriele Steidl
{"title":"Wasserstein gradient flows of maximum mean discrepancy functionals with distance kernels under Sobolev regularization.","authors":"Richard Duong, Nicolaj Rux, Viktor Stein, Gabriele Steidl","doi":"10.1098/rsta.2024.0243","DOIUrl":"https://doi.org/10.1098/rsta.2024.0243","url":null,"abstract":"<p><p>We consider Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals, [Formula: see text] for positive and negative distance kernels [Formula: see text] and given target measures [Formula: see text] on [Formula: see text]. Since in one dimension, the Wasserstein space can be isometrically embedded into the cone [Formula: see text] of quantile functions, Wasserstein gradient flows can be characterized by the solution of an associated Cauchy problem on [Formula: see text]. While for the negative kernel, the MMD functional is geodesically convex, this is not the case for the positive kernel, which needs to be handled to ensure the existence of the flow. We propose to add a regularizing Sobolev term [Formula: see text] corresponding to the Laplacian with Neumann boundary conditions to the Cauchy problem of quantile functions. Indeed, this ensures the existence of a generalized minimizing movement (GMM) for the positive kernel. Furthermore, for the negative kernel, we demonstrate by numerical examples how the Laplacian rectifies a 'dissipation-of-mass' defect of the MMD gradient flow.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240243"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Teo Deveney, Jan Stanczuk, Lisa Kreusser, Chris Budd, Carola-Bibiane Schönlieb
{"title":"Closing the ODE-SDE gap in score-based diffusion models through the Fokker-Planck equation.","authors":"Teo Deveney, Jan Stanczuk, Lisa Kreusser, Chris Budd, Carola-Bibiane Schönlieb","doi":"10.1098/rsta.2024.0503","DOIUrl":"10.1098/rsta.2024.0503","url":null,"abstract":"<p><p>Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to both their mathematical foundations and their state-of-the art performance in many tasks. Empirically, it has been reported that samplers based on ordinary differential equations (ODEs) are inferior to those based on stochastic differential equations (SDEs). In this article, we systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models and show how this relates to an associated Fokker-Planck equation. We rigorously describe the full range of dynamics and approximations arising when training score-based diffusion models and derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker-Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions that we demonstrate using explicit comparisons. Moreover, we show numerically that reducing this Fokker-Planck residual by adding it as an additional regularization term during training closes the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularization can improve the distribution generated by the ODE; however this can come at the cost of degraded SDE sample quality.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240503"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12139524/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wilson G Gregory, David W Hogg, Ben Blum-Smith, Maria Teresa Arias, Kaze W K Wong, Soledad Villar
{"title":"Equivariant geometric convolutions for dynamical systems on vector and tensor images.","authors":"Wilson G Gregory, David W Hogg, Ben Blum-Smith, Maria Teresa Arias, Kaze W K Wong, Soledad Villar","doi":"10.1098/rsta.2024.0247","DOIUrl":"10.1098/rsta.2024.0247","url":null,"abstract":"<p><p>Machine learning methods are increasingly being employed as surrogate models in place of computationally expensive and slow numerical integrators for a bevy of applications in the natural sciences. However, while the laws of physics are relationships between scalars, vectors and tensors that hold regardless of the frame of reference or chosen coordinate system, surrogate machine learning models are not coordinate-free by default. We enforce coordinate freedom by using geometric convolutions in three model architectures: a ResNet, a Dilated ResNet and a UNet. In numerical experiments emulating two-dimensional compressible Navier-Stokes, we see better accuracy and improved stability compared with baseline surrogate models in almost all cases. The ease of enforcing coordinate freedom without making major changes to the model architecture provides an exciting recipe for any convolutional neural network-based method applied to an appropriate class of problems.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240247"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12139525/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolás García Trillos, Aditya Kumar Akash, Sixu Li, Konstantin Riedl, Yuhua Zhu
{"title":"Defending against diverse attacks in federated learning through consensus-based bi-level optimization.","authors":"Nicolás García Trillos, Aditya Kumar Akash, Sixu Li, Konstantin Riedl, Yuhua Zhu","doi":"10.1098/rsta.2024.0235","DOIUrl":"https://doi.org/10.1098/rsta.2024.0235","url":null,"abstract":"<p><p>Adversarial attacks pose significant challenges in many machine learning applications, particularly in the setting of distributed training and federated learning, where malicious agents seek to corrupt the training process with the goal of jeopardizing and compromising the performance and reliability of the final models. In this paper, we address the problem of robust federated learning in the presence of such attacks by formulating the training task as a bi-level optimization problem. We conduct a theoretical analysis of the resilience of consensus-based bi-level optimization (CB<sup>2</sup>O), an interacting multi-particle metaheuristic optimization method, in adversarial settings. Specifically, we provide a global convergence analysis of CB<sup>2</sup>O in mean-field law in the presence of malicious agents, demonstrating the robustness of CB<sup>2</sup>O against a diverse range of attacks. Thereby, we offer insights into how specific hyperparameter choices enable to mitigate adversarial effects. On the practical side, we extend CB<sup>2</sup>O to the clustered federated learning setting by proposing FedCB<sup>2</sup>O, a novel interacting multi-particle system, and design a practical algorithm that addresses the demands of real-world applications. Extensive experiments demonstrate the robustness of the FedCB<sup>2</sup>O algorithm against label-flipping attacks in decentralized clustered federated learning scenarios, showcasing its effectiveness in practical contexts.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240235"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Rosenberg, John Kennedy, Zohar Keidar, Yehoshua Y Zeevi, Guy Gilboa
{"title":"Ensemble of weak spectral total-variation learners: a PET-CT case study.","authors":"Anna Rosenberg, John Kennedy, Zohar Keidar, Yehoshua Y Zeevi, Guy Gilboa","doi":"10.1098/rsta.2024.0236","DOIUrl":"https://doi.org/10.1098/rsta.2024.0236","url":null,"abstract":"<p><p>Solving computer vision problems through machine learning, one often encounters lack of sufficient training data. To mitigate this, we propose the use of ensembles of weak learners based on spectral total-variation (STV) features (Gilboa G. 2014 A total variation spectral framework for scale and texture analysis. <i>SIAM J. Imaging Sci</i>. <b>7</b>, 1937-1961. (doi:10.1137/130930704)). The features are related to nonlinear eigenfunctions of the total-variation subgradient and can characterize well textures at various scales. It was shown (Burger M, Gilboa G, Moeller M, Eckardt L, Cremers D. 2016 Spectral decompositions using one-homogeneous functionals. <i>SIAM J. Imaging Sci</i>. <b>9</b>, 1374-1408. (doi:10.1137/15m1054687)) that, in the one-dimensional case, orthogonal features are generated, whereas in two dimensions the features are empirically lowly correlated. Ensemble learning theory advocates the use of lowly correlated weak learners. We thus propose here to design ensembles using learners based on STV features. To show the effectiveness of this paradigm, we examine a hard real-world medical imaging problem: the predictive value of computed tomography (CT) data for high uptake in positron emission tomography (PET) for patients suspected of skeletal metastases. The database consists of 457 scans with 1524 unique pairs of registered CT and PET slices. Our approach is compared with deep-learning methods and to radiomics features, showing STV learners perform best (AUC=[Formula: see text]), compared with neural nets (AUC=[Formula: see text]) and radiomics (AUC=[Formula: see text]). We observe that fine STV scales in CT images are especially indicative of the presence of high uptake in PET.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240236"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Connections between sequential Bayesian inference and evolutionary dynamics.","authors":"Sahani Pathiraja, Philipp Wacker","doi":"10.1098/rsta.2024.0241","DOIUrl":"10.1098/rsta.2024.0241","url":null,"abstract":"<p><p>It has long been posited that there is a connection between the dynamical equations describing evolutionary processes in biology and sequential Bayesian learning methods. This manuscript describes new research in which this precise connection is rigorously established in the continuous time setting. Here we focus on a partial differential equation known as the Kushner-Stratonovich equation describing the evolution of the posterior density in time. Of particular importance is a piecewise smooth approximation of the observation path from which the discrete time filtering equations, which are shown to converge to a Stratonovich interpretation of the Kushner-Stratonovich equation. This smooth formulation will then be used to draw precise connections between nonlinear stochastic filtering and replicator-mutator dynamics. Additionally, gradient flow formulations will be investigated as well as a form of replicator-mutator dynamics that is shown to be beneficial for the misspecified model filtering problem. It is hoped this work will spur further research into exchanges between sequential learning and evolutionary biology and to inspire new algorithms in filtering and sampling.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240241"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12152925/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin Burger, Samira Kabri, Yury Korolev, Tim Roith, Lukas Weigand
{"title":"Analysis of mean-field models arising from self-attention dynamics in transformer architectures with layer normalization.","authors":"Martin Burger, Samira Kabri, Yury Korolev, Tim Roith, Lukas Weigand","doi":"10.1098/rsta.2024.0233","DOIUrl":"10.1098/rsta.2024.0233","url":null,"abstract":"<p><p>The aim of this article is to provide a mathematical analysis of transformer architectures using a self-attention mechanism with layer normalization. In particular, observed patterns in such architectures resembling either clusters or uniform distributions pose a number of challenging mathematical questions. We focus on a special case that admits a gradient flow formulation in the spaces of probability measures on the unit sphere under a special metric, which allows us to give at least partial answers in a rigorous way. The arising mathematical problems resemble those recently studied in aggregation equations but with additional challenges emerging from restricting the dynamics to the sphere and the particular form of the interaction energy. We provide a rigorous framework for studying the gradient flow, which also suggests a possible metric geometry to study the general case (i.e. one that is not described by a gradient flow). We further analyse the stationary points of the induced self-attention dynamics. The latter are related to stationary points of the interaction energy in the Wasserstein geometry, and we further discuss energy minimizers and maximizers in different parameter settings.This article is part of the theme issue 'Partial differential equations in data science'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2298","pages":"20240233"},"PeriodicalIF":4.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12152857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144226161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}