Foundations of data science (Springfield, Mo.)最新文献_第3页

Applying topological data analysis to local search problems 将拓扑数据分析应用于局部搜索问题

Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI: 10.3934/fods.2022006

Erik Carlsson, J. Carlsson, Shannon Sweitzer

{"title":"Applying topological data analysis to local search problems","authors":"Erik Carlsson, J. Carlsson, Shannon Sweitzer","doi":"10.3934/fods.2022006","DOIUrl":"https://doi.org/10.3934/fods.2022006","url":null,"abstract":"We present an application of topological data analysis (TDA) to discrete optimization problems, which we show can improve the performance of the 2-opt local search method for the traveling salesman problem by simply applying standard Vietoris-Rips construction to a data set of trials. We then construct a simplicial complex which is specialized for this sort of simulated data set, determined by a stochastic matrix with a steady state vector <inline-formula><tex-math id=\"M1\">begin{document}$ (P,pi) $end{document}</tex-math></inline-formula>. When <inline-formula><tex-math id=\"M2\">begin{document}$ P $end{document}</tex-math></inline-formula> is induced from a random walk on a finite metric space, this complex exhibits similarities with standard constructions such as Vietoris-Rips on the data set, but is not sensitive to outliers, as sparsity is a natural feature of the construction. We interpret the persistent homology groups in several examples coming from random walks and discrete optimization, and illustrate how higher dimensional Betti numbers can be used to classify connected components, i.e. zero dimensional features in higher dimensions.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Multimodal correlations-based data clustering 基于多模态相关的数据聚类

Foundations of data science (Springfield, Mo.) Pub Date : 2022-01-01 DOI: 10.3934/fods.2022011

Jia Chen, I. Schizas

{"title":"Multimodal correlations-based data clustering","authors":"Jia Chen, I. Schizas","doi":"10.3934/fods.2022011","DOIUrl":"https://doi.org/10.3934/fods.2022011","url":null,"abstract":"This work proposes a novel technique for clustering multimodal data according to their information content. Statistical correlations present in data that contain similar information are exploited to perform the clustering task. Specifically, multiset canonical correlation analysis is equipped with norm-one regularization mechanisms to identify clusters within different types of data that share the same information content. A pertinent minimization formulation is put forth, while block coordinate descent is employed to derive a batch clustering algorithm which achieves better clustering performance than existing alternatives. Relying on subgradient descent, an online clustering approach is derived which substantially lowers computational complexity compared to the batch approach, while not compromising significantly the clustering performance. It is established that for an increasing number of data the novel regularized multiset framework is able to correctly cluster the multimodal data entries. Further, it is proved that the online clustering scheme converges with probability one to a stationary point of the ensemble regularized multiset correlations cost having the potential to recover the correct clusters. Extensive numerical tests demonstrate that the novel clustering scheme outperforms existing alternatives, while the online scheme achieves substantial computational savings.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HOMOTOPY CONTINUATION FOR THE SPECTRA OF PERSISTENT LAPLACIANS. 持久拉普拉斯算子谱的同伦延拓。

Foundations of data science (Springfield, Mo.) Pub Date : 2021-12-01 DOI: 10.3934/fods.2021017

Xiaoqi Wei, Guo-Wei Wei

{"title":"HOMOTOPY CONTINUATION FOR THE SPECTRA OF PERSISTENT LAPLACIANS.","authors":"Xiaoqi Wei, Guo-Wei Wei","doi":"10.3934/fods.2021017","DOIUrl":"https://doi.org/10.3934/fods.2021017","url":null,"abstract":"The p-persistent q-combinatorial Laplacian defined for a pair of simplicial complexes is a generalization of the q-combinatorial Laplacian. Given a filtration, the spectra of persistent combinatorial Laplacians not only recover the persistent Betti numbers of persistent homology but also provide extra multiscale geometrical information of the data. Paired with machine learning algorithms, the persistent Laplacian has many potential applications in data science. Seeking different ways to find the spectrum of an operator is an active research topic, becoming interesting when ideas are originated from multiple fields. In this work, we explore an alternative approach for the spectrum of persistent Laplacians. As the eigenvalues of a persistent Laplacian matrix are the roots of its characteristic polynomial, one may attempt to find the roots of the characteristic polynomial by homotopy continuation, and thus resolving the spectrum of the corresponding persistent Laplacian. We consider a set of simple polytopes and small molecules to prove the principle that algebraic topology, combinatorial graph, and algebraic geometry can be integrated to understand the shape of data.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"3 4","pages":"677-700"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9273002/pdf/nihms-1768199.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40610845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Analysis of the feedback particle filter with diffusion map based approximation of the gain 基于扩散图逼近增益的反馈粒子滤波器分析

Foundations of data science (Springfield, Mo.) Pub Date : 2021-09-06 DOI: 10.3934/fods.2021023

S. Pathiraja, W. Stannat

引用次数: 3

Fast computation of persistent homology representatives with involuted persistent homology 具有对折持久同调的持久同调表示的快速计算

Foundations of data science (Springfield, Mo.) Pub Date : 2021-05-08 DOI: 10.3934/fods.2023006

Matija vCufar, Žiga Virk

引用次数: 8

HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE. hermes：持久光谱图软件。

Foundations of data science (Springfield, Mo.) Pub Date : 2021-03-01 DOI: 10.3934/fods.2021006

Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei Wei

{"title":"HERMES: PERSISTENT SPECTRAL GRAPH SOFTWARE.","authors":"Rui Wang, Rundong Zhao, Emily Ribando-Gros, Jiahui Chen, Yiying Tong, Guo-Wei Wei","doi":"10.3934/fods.2021006","DOIUrl":"10.3934/fods.2021006","url":null,"abstract":"Persistent homology (PH) is one of the most popular tools in topological data analysis (TDA), while graph theory has had a significant impact on data science. Our earlier work introduced the persistent spectral graph (PSG) theory as a unified multiscale paradigm to encompass TDA and geometric analysis. In PSG theory, families of persistent Laplacian matrices (PLMs) corresponding to various topological dimensions are constructed via a filtration to sample a given dataset at multiple scales. The harmonic spectra from the null spaces of PLMs offer the same topological invariants, namely persistent Betti numbers, at various dimensions as those provided by PH, while the non-harmonic spectra of PLMs give rise to additional geometric analysis of the shape of the data. In this work, we develop an open-source software package, called highly efficient robust multidimensional evolutionary spectra (HERMES), to enable broad applications of PSGs in science, engineering, and technology. To ensure the reliability and robustness of HERMES, we have validated the software with simple geometric shapes and complex datasets from three-dimensional (3D) protein structures. We found that the smallest non-zero eigenvalues are very sensitive to data abnormality.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"3 1","pages":"67-97"},"PeriodicalIF":0.0,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8411887/pdf/nihms-1717421.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39387483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A study of disproportionately affected populations by race/ethnicity during the SARS-CoV-2 pandemic using multi-population SEIR modeling and ensemble data assimilation 使用多人群SEIR模型和集合数据同化对SARS-CoV-2大流行期间按种族/族裔受不成比例影响人群的研究

Foundations of data science (Springfield, Mo.) Pub Date : 2021-01-01 DOI: 10.3934/fods.2021022

Emmanuel Fleurantin, C. Sampson, Daniel P. Maes, Justin P. Bennett, Tayler Fernandes-Nunez, S. Marx, G. Evensen

{"title":"A study of disproportionately affected populations by race/ethnicity during the SARS-CoV-2 pandemic using multi-population SEIR modeling and ensemble data assimilation","authors":"Emmanuel Fleurantin, C. Sampson, Daniel P. Maes, Justin P. Bennett, Tayler Fernandes-Nunez, S. Marx, G. Evensen","doi":"10.3934/fods.2021022","DOIUrl":"https://doi.org/10.3934/fods.2021022","url":null,"abstract":"The disparity in the impact of COVID-19 on minority populations in the United States has been well established in the available data on deaths, case counts, and adverse outcomes. However, critical metrics used by public health officials and epidemiologists, such as a time dependent viral reproductive number (<inline-formula><tex-math id=\"M1\">begin{document}$ R_t $end{document}</tex-math></inline-formula>), can be hard to calculate from this data especially for individual populations. Furthermore, disparities in the availability of testing, record keeping infrastructure, or government funding in disadvantaged populations can produce incomplete data sets. In this work, we apply ensemble data assimilation techniques which optimally combine model and data to produce a more complete data set providing better estimates of the critical metrics used by public health officials and epidemiologists. We employ a multi-population SEIR (Susceptible, Exposed, Infected and Recovered) model with a time dependent reproductive number and age stratified contact rate matrix for each population. We assimilate the daily death data for populations separated by ethnic/racial groupings using a technique called Ensemble Smoothing with Multiple Data Assimilation (ESMDA) to estimate model parameters and produce an <inline-formula><tex-math id=\"M10000\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula> for the <inline-formula><tex-math id=\"M2000\">begin{document}$n^{th}$end{document}</tex-math></inline-formula> population. We do this with three distinct approaches, (1) using the same contact matrices and prior <inline-formula><tex-math id=\"M30000\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula> for each population, (2) assigning contact matrices with increased contact rates for working age and older adults to populations experiencing disparity and (3) as in (2) but with a time-continuous update to <inline-formula><tex-math id=\"M4\">begin{document}$R_t(n)$end{document}</tex-math></inline-formula>. We make a study of 9 U.S. states and the District of Columbia providing a complete time series of the pandemic in each and, in some cases, identifying disparities not otherwise evident in the aggregate statistics.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Intrinsic disease maps using persistent cohomology 使用持续上同源的内在疾病图

Foundations of data science (Springfield, Mo.) Pub Date : 2021-01-01 DOI: 10.3934/FODS.2021008

Daniel Amin, Mikael Vejdemo-Johansson

引用次数: 1

ToFU: Topology functional units for deep learning 豆腐:深度学习的拓扑功能单元

Foundations of data science (Springfield, Mo.) Pub Date : 2021-01-01 DOI: 10.3934/fods.2021021

Christopher Oballe, D. Boothe, P. Franaszczuk, V. Maroulas

引用次数: 3

A density-based approach to feature detection in persistence diagrams for firn data 一种基于密度的方法，用于在企业数据的持久性图中进行特征检测

Foundations of data science (Springfield, Mo.) Pub Date : 2021-01-01 DOI: 10.3934/FODS.2021012

A. Lawson, Tyler Hoffman, Yu-Min Chung, K. Keegan, S. Day

{"title":"A density-based approach to feature detection in persistence diagrams for firn data","authors":"A. Lawson, Tyler Hoffman, Yu-Min Chung, K. Keegan, S. Day","doi":"10.3934/FODS.2021012","DOIUrl":"https://doi.org/10.3934/FODS.2021012","url":null,"abstract":"Topological data analysis, and in particular persistence diagrams, are gaining popularity as tools for extracting topological information from noisy point cloud and digital data. Persistence diagrams track topological features in the form of begin{document}$ k $end{document} -dimensional holes in the data. Here, we construct a new, automated approach for identifying persistence diagram points that represent robust long-life features. These features may be used to provide a more accurate estimate of Betti numbers for the underlying space. This approach extends the established practice of using a lifespan cutoff on the features in order to take advantage of the observation that noisy features typically appear in clusters in the persistence diagram. We show that this approach offers more flexibility in partitioning features in the persistence diagram, resulting in greater accuracy in computed Betti numbers, especially in the case of high noise levels and varying image illumination. This work is motivated by 3-dimensional Micro-CT imaging of ice core samples, and is applicable for separating noise from robust signals in persistence diagrams from noisy data.","PeriodicalId":73054,"journal":{"name":"Foundations of data science (Springfield, Mo.)","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70248282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2