{"title":"通过拓扑数据分析看环状体中模式的意义","authors":"Changjo Yu, Sungkyu Jung, Jisu Kim","doi":"10.1002/sta4.636","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of identifying modes or density bumps in multivariate angular or circular data, which have diverse applications in fields like medicine, biology and physics. We focus on the use of topological data analysis and persistent homology for this task. Specifically, we extend the methods for uncertainty quantification in the context of a torus sample space, where circular data lie. To achieve this, we employ two types of density estimators, namely, the von Mises kernel density estimator and the von Mises mixture model, to compute persistent homology, and propose a scale-space view for searching significant bumps in the density. The results of bump hunting are summarised and visualised through a scale-space diagram. Our approach using the mixture model for persistent homology offers advantages over conventional methods, allowing for dendrogram visualisation of components and identification of mode locations. For testing whether a detected mode is really there, we propose several inference tools based on bootstrap resampling and concentration inequalities, establishing their theoretical applicability. Experimental results on SARS-CoV-2 spike glycoprotein torsion angle data demonstrate the effectiveness of our proposed methods in practice.","PeriodicalId":56159,"journal":{"name":"Stat","volume":"20 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Significance of modes in the torus by topological data analysis\",\"authors\":\"Changjo Yu, Sungkyu Jung, Jisu Kim\",\"doi\":\"10.1002/sta4.636\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses the problem of identifying modes or density bumps in multivariate angular or circular data, which have diverse applications in fields like medicine, biology and physics. We focus on the use of topological data analysis and persistent homology for this task. Specifically, we extend the methods for uncertainty quantification in the context of a torus sample space, where circular data lie. To achieve this, we employ two types of density estimators, namely, the von Mises kernel density estimator and the von Mises mixture model, to compute persistent homology, and propose a scale-space view for searching significant bumps in the density. The results of bump hunting are summarised and visualised through a scale-space diagram. Our approach using the mixture model for persistent homology offers advantages over conventional methods, allowing for dendrogram visualisation of components and identification of mode locations. For testing whether a detected mode is really there, we propose several inference tools based on bootstrap resampling and concentration inequalities, establishing their theoretical applicability. Experimental results on SARS-CoV-2 spike glycoprotein torsion angle data demonstrate the effectiveness of our proposed methods in practice.\",\"PeriodicalId\":56159,\"journal\":{\"name\":\"Stat\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stat\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1002/sta4.636\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stat","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/sta4.636","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
摘要
本文探讨了在多元角度或圆形数据中识别模式或密度凹凸的问题,这些数据在医学、生物学和物理学等领域有着广泛的应用。我们重点关注拓扑数据分析和持久同源性在这项任务中的应用。具体来说,我们将不确定性量化方法扩展到环形数据所在的环形样本空间中。为此,我们采用了两种密度估算器,即 von Mises 核密度估算器和 von Mises 混合模型,来计算持久同源性,并提出了在密度中搜索重要凹凸的尺度空间视图。我们通过标度空间图总结并直观地展示了凹凸搜索的结果。与传统方法相比,我们使用混合物模型计算持久同源性的方法具有优势,可以实现成分的树枝图可视化和模式位置的识别。为了检验检测到的模式是否真实存在,我们提出了几种基于引导重采样和浓度不等式的推理工具,并确定了它们的理论适用性。在 SARS-CoV-2 穗状糖蛋白扭转角数据上的实验结果证明了我们提出的方法在实践中的有效性。
Significance of modes in the torus by topological data analysis
This paper addresses the problem of identifying modes or density bumps in multivariate angular or circular data, which have diverse applications in fields like medicine, biology and physics. We focus on the use of topological data analysis and persistent homology for this task. Specifically, we extend the methods for uncertainty quantification in the context of a torus sample space, where circular data lie. To achieve this, we employ two types of density estimators, namely, the von Mises kernel density estimator and the von Mises mixture model, to compute persistent homology, and propose a scale-space view for searching significant bumps in the density. The results of bump hunting are summarised and visualised through a scale-space diagram. Our approach using the mixture model for persistent homology offers advantages over conventional methods, allowing for dendrogram visualisation of components and identification of mode locations. For testing whether a detected mode is really there, we propose several inference tools based on bootstrap resampling and concentration inequalities, establishing their theoretical applicability. Experimental results on SARS-CoV-2 spike glycoprotein torsion angle data demonstrate the effectiveness of our proposed methods in practice.
StatDecision Sciences-Statistics, Probability and Uncertainty
CiteScore
1.10
自引率
0.00%
发文量
85
期刊介绍:
Stat is an innovative electronic journal for the rapid publication of novel and topical research results, publishing compact articles of the highest quality in all areas of statistical endeavour. Its purpose is to provide a means of rapid sharing of important new theoretical, methodological and applied research. Stat is a joint venture between the International Statistical Institute and Wiley-Blackwell.
Stat is characterised by:
• Speed - a high-quality review process that aims to reach a decision within 20 days of submission.
• Concision - a maximum article length of 10 pages of text, not including references.
• Supporting materials - inclusion of electronic supporting materials including graphs, video, software, data and images.
• Scope - addresses all areas of statistics and interdisciplinary areas.
Stat is a scientific journal for the international community of statisticians and researchers and practitioners in allied quantitative disciplines.