Persistent obstruction theory for a model category of measures with applications to data merging

Abraham Smith, Paul Bendich, J. Harer
{"title":"Persistent obstruction theory for a model category of measures with applications to data merging","authors":"Abraham Smith, Paul Bendich, J. Harer","doi":"10.1090/BTRAN/56","DOIUrl":null,"url":null,"abstract":"Collections of measures on compact metric spaces form a model category (“data complexes”), whose morphisms are marginalization integrals. The fibrant objects in this category represent collections of measures in which there is a measure on a product space that marginalizes to any measures on pairs of its factors. The homotopy and homology for this category allow measurement of obstructions to finding measures on larger and larger product spaces. The obstruction theory is compatible with a fibrant filtration built from the Wasserstein distance on measures.\n\nDespite the abstract tools, this is motivated by a widespread problem in data science. Data complexes provide a mathematical foundation for semi-automated data-alignment tools that are common in commercial database software. Practically speaking, the theory shows that database JOIN operations are subject to genuine topological obstructions. Those obstructions can be detected by an obstruction cocycle and can be resolved by moving through a filtration. Thus, any collection of databases has a persistence level, which measures the difficulty of JOINing those databases. Because of its general formulation, this persistent obstruction theory also encompasses multi-modal data fusion problems, some forms of Bayesian inference, and probability couplings.","PeriodicalId":377306,"journal":{"name":"Transactions of the American Mathematical Society, Series B","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions of the American Mathematical Society, Series B","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1090/BTRAN/56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Collections of measures on compact metric spaces form a model category (“data complexes”), whose morphisms are marginalization integrals. The fibrant objects in this category represent collections of measures in which there is a measure on a product space that marginalizes to any measures on pairs of its factors. The homotopy and homology for this category allow measurement of obstructions to finding measures on larger and larger product spaces. The obstruction theory is compatible with a fibrant filtration built from the Wasserstein distance on measures. Despite the abstract tools, this is motivated by a widespread problem in data science. Data complexes provide a mathematical foundation for semi-automated data-alignment tools that are common in commercial database software. Practically speaking, the theory shows that database JOIN operations are subject to genuine topological obstructions. Those obstructions can be detected by an obstruction cocycle and can be resolved by moving through a filtration. Thus, any collection of databases has a persistence level, which measures the difficulty of JOINing those databases. Because of its general formulation, this persistent obstruction theory also encompasses multi-modal data fusion problems, some forms of Bayesian inference, and probability couplings.
持续障碍理论作为一种模型范畴的测度及其在数据合并中的应用
紧度量空间上的测度集合形成一个模型范畴(“数据复合体”),其态射是边际积分。在这个类别的纤维对象表示的措施集合,其中有一个措施的产品空间,边缘化的任何措施对其因素对。这个范畴的同伦和同调允许通过测量障碍来在越来越大的积空间上寻找度量。阻塞理论与从度量上的沃瑟斯坦距离建立的纤维过滤是相容的。尽管是抽象的工具,但这是由数据科学中一个普遍存在的问题所驱动的。数据复合体为商业数据库软件中常见的半自动数据对齐工具提供了数学基础。实际上,该理论表明数据库JOIN操作受到真正的拓扑障碍的影响。这些障碍物可以通过障碍物循环检测,并且可以通过移动过滤器来解决。因此,任何数据库集合都具有持久性级别,用于度量连接这些数据库的难度。由于其一般的表述,这种持续障碍理论还包括多模态数据融合问题,某些形式的贝叶斯推理和概率耦合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信