{"title":"HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE.","authors":"Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei Wei","doi":"10.3934/fods.2021006","DOIUrl":"10.3934/fods.2021006","url":null,"abstract":"<p><p>Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacian matrices (PLMs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLMs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLMs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8411887/pdf/nihms-1717421.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39387483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emmanuel Fleurantin, C. Sampson, Daniel P. Maes, Justin P. Bennett, Tayler Fernandes-Nunez, S. Marx, G. Evensen
{"title":"A study of disproportionately affected populations by race/ethnicity during the SARS-CoV-2 pandemic using multi-population SEIR modeling and ensemble data assimilation","authors":"Emmanuel Fleurantin, C. Sampson, Daniel P. Maes, Justin P. Bennett, Tayler Fernandes-Nunez, S. Marx, G. Evensen","doi":"10.3934/fods.2021022","DOIUrl":"https://doi.org/10.3934/fods.2021022","url":null,"abstract":"<p style='text-indent:20px;'>The disparity in the impact of COVID-19 on minority populations in the United States has been well established in the available data on deaths, case counts, and adverse outcomes. However, critical metrics used by public health officials and epidemiologists, such as a time dependent viral reproductive number (<inline-formula><tex-math id=\"M1\">begin{document}$ R_t $end{document}</tex-math></inline-formula>), can be hard to calculate from this data especially for individual populations. Furthermore, disparities in the availability of testing, record keeping infrastructure, or government funding in disadvantaged populations can produce incomplete data sets. In this work, we apply ensemble data assimilation techniques which optimally combine model and data to produce a more complete data set providing better estimates of the critical metrics used by public health officials and epidemiologists. We employ a multi-population SEIR (Susceptible, Exposed, Infected and Recovered) model with a time dependent reproductive number and age stratified contact rate matrix for each population. We assimilate the daily death data for populations separated by ethnic/racial groupings using a technique called Ensemble Smoothing with Multiple Data Assimilation (ESMDA) to estimate model parameters and produce an <inline-formula><tex-math id=\"M10000\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula> for the <inline-formula><tex-math id=\"M2000\">begin{document}$n^{th}$end{document}</tex-math></inline-formula> population. We do this with three distinct approaches, (1) using the same contact matrices and prior <inline-formula><tex-math id=\"M30000\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula> for each population, (2) assigning contact matrices with increased contact rates for working age and older adults to populations experiencing disparity and (3) as in (2) but with a time-continuous update to <inline-formula><tex-math id=\"M4\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula>. We make a study of 9 U.S. states and the District of Columbia providing a complete time series of the pandemic in each and, in some cases, identifying disparities not otherwise evident in the aggregate statistics.</p>","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intrinsic disease maps using persistent cohomology","authors":"Daniel Amin, Mikael Vejdemo-Johansson","doi":"10.3934/FODS.2021008","DOIUrl":"https://doi.org/10.3934/FODS.2021008","url":null,"abstract":"We use persistent cohomology and circular coordinates to investigate three datasets related to infectious diseases. We show that all three datasets exhibit circular coordinates that carry information about the data itself. For one of the datasets we are able to recover time post infection from the circular coordinate itself – for the other datasets, this information was not available, but in one we were able to relate the circular coordinate to red blood cell counts and weight changes in the subjects.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Oballe, D. Boothe, P. Franaszczuk, V. Maroulas
{"title":"ToFU: Topology functional units for deep learning","authors":"Christopher Oballe, D. Boothe, P. Franaszczuk, V. Maroulas","doi":"10.3934/fods.2021021","DOIUrl":"https://doi.org/10.3934/fods.2021021","url":null,"abstract":"We propose ToFU, a new trainable neural network unit with a persistence diagram dissimilarity function as its activation. Since persistence diagrams are topological summaries of structures, this new activation measures and learns the topology of data to leverage it in machine learning tasks. We showcase the utility of ToFU in two experiments: one involving the classification of discrete-time autoregressive signals, and another involving a variational autoencoder. In the former, ToFU yields competitive results with networks that use spectral features while outperforming CNN architectures. In the latter, ToFU produces topologically-interpretable latent space representations of inputs without sacrificing reconstruction fidelity.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Lawson, Tyler Hoffman, Yu-Min Chung, K. Keegan, S. Day
{"title":"A density-based approach to feature detection in persistence diagrams for firn data","authors":"A. Lawson, Tyler Hoffman, Yu-Min Chung, K. Keegan, S. Day","doi":"10.3934/FODS.2021012","DOIUrl":"https://doi.org/10.3934/FODS.2021012","url":null,"abstract":"Topological data analysis, and in particular persistence diagrams, are gaining popularity as tools for extracting topological information from noisy point cloud and digital data. Persistence diagrams track topological features in the form of begin{document}$ k $end{document} -dimensional holes in the data. Here, we construct a new, automated approach for identifying persistence diagram points that represent robust long-life features. These features may be used to provide a more accurate estimate of Betti numbers for the underlying space. This approach extends the established practice of using a lifespan cutoff on the features in order to take advantage of the observation that noisy features typically appear in clusters in the persistence diagram. We show that this approach offers more flexibility in partitioning features in the persistence diagram, resulting in greater accuracy in computed Betti numbers, especially in the case of high noise levels and varying image illumination. This work is motivated by 3-dimensional Micro-CT imaging of ice core samples, and is applicable for separating noise from robust signals in persistence diagrams from noisy data.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul E. Anderson, T. Chartier, A. Langville, Kathryn E. Pedings-Behling
{"title":"The rankability of weighted data from pairwise comparisons","authors":"Paul E. Anderson, T. Chartier, A. Langville, Kathryn E. Pedings-Behling","doi":"10.3934/FODS.2021002","DOIUrl":"https://doi.org/10.3934/FODS.2021002","url":null,"abstract":"In prior work [ 4 ], Anderson et al. introduced a new problem, the rankability problem, which refers to a dataset's inherent ability to produce a meaningful ranking of its items. Ranking is a fundamental data science task with numerous applications that include web search, data mining, cybersecurity, machine learning, and statistical learning theory. Yet little attention has been paid to the question of whether a dataset is suitable for ranking. As a result, when a ranking method is applied to a dataset with low rankability, the resulting ranking may not be reliable. Rankability paper [ 4 ] and its methods studied unweighted data for which the dominance relations are binary, i.e., an item either dominates or is dominated by another item. In this paper, we extend rankability methods to weighted data for which an item may dominate another by any finite amount. We present combinatorial approaches to a weighted rankability measure and apply our new measure to several weighted datasets.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning landmark geodesics using the ensemble Kalman filter","authors":"Andreas Bock, C. Cotter","doi":"10.3934/fods.2021020","DOIUrl":"https://doi.org/10.3934/fods.2021020","url":null,"abstract":"We study the problem of diffeomorphometric geodesic landmark matching where the objective is to find a diffeomorphism that, via its group action, maps between two sets of landmarks. It is well-known that the motion of the landmarks, and thereby the diffeomorphism, can be encoded by an initial momentum leading to a formulation where the landmark matching problem can be solved as an optimisation problem over such momenta. The novelty of our work lies in the application of a derivative-free Bayesian inverse method for learning the optimal momentum encoding the diffeomorphic mapping between the template and the target. The method we apply is the ensemble Kalman filter, an extension of the Kalman filter to nonlinear operators. We describe an efficient implementation of the algorithm and show several numerical results for various target shapes.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yossi Bokor Bleile, Katharine Turner, Christopher Williams
{"title":"Reconstructing linearly embedded graphs: A first step to stratified space learning","authors":"Yossi Bokor Bleile, Katharine Turner, Christopher Williams","doi":"10.3934/fods.2021026","DOIUrl":"https://doi.org/10.3934/fods.2021026","url":null,"abstract":"In this paper, we consider the simplest class of stratified spaces – linearly embedded graphs. We present an algorithm that learns the abstract structure of an embedded graph and models the specific embedding from a point cloud sampled from it. We use tools and inspiration from computational geometry, algebraic topology, and topological data analysis and prove the correctness of the identified abstract structure under assumptions on the embedding. The algorithm is implemented in the Julia package Skyler, which we used for the numerical simulations in this paper.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Score matching filters for Gaussian Markov random fields with a linear model of the precision matrix","authors":"Marie Turčičová, J. Mandel, K. Eben","doi":"10.3934/fods.2021030","DOIUrl":"https://doi.org/10.3934/fods.2021030","url":null,"abstract":"We present an ensemble filtering method based on a linear model for the precision matrix (the inverse of the covariance) with the parameters determined by Score Matching Estimation. The method provides a rigorous covariance regularization when the underlying random field is Gaussian Markov. The parameters are found by solving a system of linear equations. The analysis step uses the inverse formulation of the Kalman update. Several filter versions, differing in the construction of the analysis ensemble, are proposed, as well as a Score matching version of the Extended Kalman Filter.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Evensen, Javier Amezcua, M. Bocquet, A. Carrassi, A. Farchi, A. Fowler, P. Houtekamer, C. Jones, R. Moraes, M. Pulido, C. Sampson, F. Vossepoel
{"title":"An international initiative of predicting the SARS-CoV-2 pandemic using ensemble data assimilation","authors":"G. Evensen, Javier Amezcua, M. Bocquet, A. Carrassi, A. Farchi, A. Fowler, P. Houtekamer, C. Jones, R. Moraes, M. Pulido, C. Sampson, F. Vossepoel","doi":"10.3934/fods.2021001","DOIUrl":"https://doi.org/10.3934/fods.2021001","url":null,"abstract":"This work demonstrates the efficiency of using iterative ensemble smoothers to estimate the parameters of an SEIR model. We have extended a standard SEIR model with age-classes and compartments of sick, hospitalized, and dead. The data conditioned on are the daily numbers of accumulated deaths and the number of hospitalized. Also, it is possible to condition the model on the number of cases obtained from testing. We start from a wide prior distribution for the model parameters; then, the ensemble conditioning leads to a posterior ensemble of estimated parameters yielding model predictions in close agreement with the observations. The updated ensemble of model simulations has predictive capabilities and include uncertainty estimates. In \u0000particular, we estimate the effective reproductive number as a function of time, and we can assess the impact of different intervention measures. By starting from the updated set of model parameters, we can make accurate short-term predictions of the epidemic development assuming \u0000knowledge of the future effective reproductive number. Also, the model system allows for the computation of long-term scenarios of the epidemic under different assumptions. We have applied the model system on data sets from several countries, i.e., the four European countries Norway, England, The Netherlands, and France; the province of Quebec in Canada; the South American countries Argentina and Brazil; and the four US states Alabama, North Carolina, California, and New York. These countries and states all have vastly different developments of the epidemic, and we could accurately model the SARS-CoV-2 outbreak in all of them. We realize that more complex models, e.g., with regional compartments, may be desirable, and we suggest that the approach used here should be applicable also for these models.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43519659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}