{"title":"Probabilistic learning on manifolds","authors":"Christian Soize, R. Ghanem","doi":"10.3934/fods.2020013","DOIUrl":"https://doi.org/10.3934/fods.2020013","url":null,"abstract":"This paper presents mathematical results in support of the methodology of the probabilistic learning on manifolds (PLoM) recently introduced by the authors, which has been used with success for analyzing complex engineering systems. The PLoM considers a given initial dataset constituted of a small number of points given in an Euclidean space, which are interpreted as independent realizations of a vector-valued random variable for which its non-Gaussian probability measure is unknown but is, textit{a priori}, concentrated in an unknown subset of the Euclidean space. The objective is to construct a learned dataset constituted of additional realizations that allow the evaluation of converged statistics. A transport of the probability measure estimated with the initial dataset is done through a linear transformation constructed using a reduced-order diffusion-maps basis. In this paper, it is proven that this transported measure is a marginal distribution of the invariant measure of a reduced-order Ito stochastic differential equation that corresponds to a dissipative Hamiltonian dynamical system. This construction allows for preserving the concentration of the probability measure. This property is shown by analyzing a distance between the random matrix constructed with the PLoM and the matrix representing the initial dataset, as a function of the dimension of the basis. It is further proven that this distance has a minimum for a dimension of the reduced-order diffusion-maps basis that is strictly smaller than the number of points in the initial dataset. Finally, a brief numerical application illustrates the mathematical results.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44044177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilevel Ensemble Kalman Filtering based on a sample average of independent EnKF estimators","authors":"Håkon Hoel, G. Shaimerdenova, R. Tempone","doi":"10.3934/fods.2020017","DOIUrl":"https://doi.org/10.3934/fods.2020017","url":null,"abstract":"We introduce a new multilevel ensemble Kalman filter method (MLEnKF) which consists of a hierarchy of independent samples of ensemble Kalman filters (EnKF). This new MLEnKF method is fundamentally different from the preexisting method introduced by Hoel, Law and Tempone in 2016, and it is suitable for extensions towards multi-index Monte Carlo based filtering methods. Robust theoretical analysis and supporting numerical examples show that under appropriate regularity assumptions, the MLEnKF method has better complexity than plain vanilla EnKF in the large-ensemble and fine-resolution limits, for weak approximations of quantities of interest. The method is developed for discrete-time filtering problems with finite-dimensional state space and linear observations polluted by additive Gaussian noise.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44833004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Index","authors":"","doi":"10.1017/9781108755528.013","DOIUrl":"https://doi.org/10.1017/9781108755528.013","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/9781108755528.013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45770402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction","authors":"","doi":"10.1017/9781108755528.001","DOIUrl":"https://doi.org/10.1017/9781108755528.001","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/9781108755528.001","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43733045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Dimensional Space","authors":"","doi":"10.1017/9781108755528.002","DOIUrl":"https://doi.org/10.1017/9781108755528.002","url":null,"abstract":"","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/9781108755528.002","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46146576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bocquet, J. Brajard, A. Carrassi, Laurent Bertino
{"title":"Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization","authors":"M. Bocquet, J. Brajard, A. Carrassi, Laurent Bertino","doi":"10.3934/fods.2020004","DOIUrl":"https://doi.org/10.3934/fods.2020004","url":null,"abstract":"The reconstruction from observations of high-dimensional chaotic dynamics such as geophysical flows is hampered by (i) the partial and noisy observations that can realistically be obtained, (ii) the need to learn from long time series of data, and (iii) the unstable nature of the dynamics. To achieve such inference from the observations over long time series, it has been suggested to combine data assimilation and machine learning in several ways. We show how to unify these approaches from a Bayesian perspective using expectation-maximization and coordinate descents. In doing so, the model, the state trajectory and model error statistics are estimated all together. Implementations and approximations of these methods are discussed. Finally, we numerically and successfully test the approach on two relevant low-order chaotic models with distinct identifiability.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49478111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mean-field and kinetic descriptions of neural differential equations","authors":"M. Herty, T. Trimborn, G. Visconti","doi":"10.3934/fods.2022007","DOIUrl":"https://doi.org/10.3934/fods.2022007","url":null,"abstract":"Nowadays, neural networks are widely used in many applications as artificial intelligence models for learning tasks. Since typically neural networks process a very large amount of data, it is convenient to formulate them within the mean-field and kinetic theory. In this work we focus on a particular class of neural networks, i.e. the residual neural networks, assuming that each layer is characterized by the same number of neurons begin{document}$ N $end{document}, which is fixed by the dimension of the data. This assumption allows to interpret the residual neural network as a time-discretized ordinary differential equation, in analogy with neural differential equations. The mean-field description is then obtained in the limit of infinitely many input data. This leads to a Vlasov-type partial differential equation which describes the evolution of the distribution of the input data. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. In the simple setting of a linear activation function and one-dimensional input data, the study of the moments provides insights on the choice of the parameters of the network. Furthermore, a modification of the microscopic dynamics, inspired by stochastic residual neural networks, leads to a Fokker-Planck formulation of the network, in which the concept of network training is replaced by the task of fitting distributions. The performed analysis is validated by artificial numerical simulations. In particular, results on classification and regression problems are presented.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42109967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topological reconstruction of sub-cellular motion with Ensemble Kalman velocimetry","authors":"Le Yin, Ioannis Sgouralis, V. Maroulas","doi":"10.3934/fods.2020007","DOIUrl":"https://doi.org/10.3934/fods.2020007","url":null,"abstract":"Microscopy imaging of plant cells allows the elaborate analysis of sub-cellular motions of organelles. The large video data set can be efficiently analyzed by automated algorithms. We develop a novel, data-oriented algorithm, which can track organelle movements and reconstruct their trajectories on stacks of image data. Our method proceeds with three steps: (ⅰ) identification, (ⅱ) localization, and (ⅲ) linking. This method combines topological data analysis and Ensemble Kalman Filtering, and does not assume a specific motion model. Application of this method on simulated data sets shows an agreement with ground truth. We also successfully test our method on real microscopy data.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems","authors":"F. Bao, T. Maier","doi":"10.3934/fods.2020001","DOIUrl":"https://doi.org/10.3934/fods.2020001","url":null,"abstract":"We propose a stochastic gradient descent based optimization algorithm to solve the analytic continuation problem in which we extract real frequency spectra from imaginary time Quantum Monte Carlo data. The procedure of analytic continuation is an ill-posed inverse problem which is usually solved by regularized optimization methods, such like the Maximum Entropy method, or stochastic optimization methods. The main contribution of this work is to improve the performance of stochastic optimization approaches by introducing a supervised stochastic gradient descent algorithm to solve a flipped inverse system which processes the random solutions obtained by a type of Fast and Efficient Stochastic Optimization Method.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70247865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}