Topological Representation of Rare States Using Combination of Persistent Homology and Complexity Measures

Rebecca Miao, Zhenyi Yang, V. Gavrishchaka
{"title":"Topological Representation of Rare States Using Combination of Persistent Homology and Complexity Measures","authors":"Rebecca Miao, Zhenyi Yang, V. Gavrishchaka","doi":"10.1109/ICICT50521.2020.00025","DOIUrl":null,"url":null,"abstract":"Identification of rare states and training models with limited data is fundamentally challenging for mainstream machine learning. Alternative approaches include one-shot learning using similarities to reference classes, meta-learning training on many related tasks and transfer learning using relevant pre-trained model. However, their performance quickly deteriorates with decreasing number of available reference classes and related tasks or lack of relevant problem for transfer learning. Previously, we proposed ensemble decomposition learning (EDL) where boosting-ensemble components trained on just two broad classes provide large number of implicit reference classes. Domain-expert knowledge such as complexity measures can be directly incorporated within EDL to reduce dependence on training data. However, success of EDL and similar approaches requires variety of complexity measures sufficiently flexible for further tuning given enough data which is not always available. Therefore, addition of complementary measures not requiring fine-tuning is important. Persistent homology (PH), one of computational topology tools, offers noise-tolerant topological summary of data set. Direct application of PH to high-dimensional data is often prohibitive and requires domain-specific dimensionality reduction. Here we suggest that PH computed on complexity measures rather than raw data could provide robust complementary metrics for enhancement of rare state representation as illustrated in the context of personalized medicine application using data from www.physionet.org.","PeriodicalId":445000,"journal":{"name":"2020 3rd International Conference on Information and Computer Technologies (ICICT)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Information and Computer Technologies (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT50521.2020.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Identification of rare states and training models with limited data is fundamentally challenging for mainstream machine learning. Alternative approaches include one-shot learning using similarities to reference classes, meta-learning training on many related tasks and transfer learning using relevant pre-trained model. However, their performance quickly deteriorates with decreasing number of available reference classes and related tasks or lack of relevant problem for transfer learning. Previously, we proposed ensemble decomposition learning (EDL) where boosting-ensemble components trained on just two broad classes provide large number of implicit reference classes. Domain-expert knowledge such as complexity measures can be directly incorporated within EDL to reduce dependence on training data. However, success of EDL and similar approaches requires variety of complexity measures sufficiently flexible for further tuning given enough data which is not always available. Therefore, addition of complementary measures not requiring fine-tuning is important. Persistent homology (PH), one of computational topology tools, offers noise-tolerant topological summary of data set. Direct application of PH to high-dimensional data is often prohibitive and requires domain-specific dimensionality reduction. Here we suggest that PH computed on complexity measures rather than raw data could provide robust complementary metrics for enhancement of rare state representation as illustrated in the context of personalized medicine application using data from www.physionet.org.
结合持久同调和复杂性度量的稀有态拓扑表示
用有限的数据识别稀有状态和训练模型对主流机器学习来说是一个根本性的挑战。替代方法包括使用参考类的相似性进行一次性学习,在许多相关任务上进行元学习训练,以及使用相关预训练模型进行迁移学习。然而,由于可用的参考类和相关任务数量的减少或缺乏迁移学习的相关问题,迁移学习的性能迅速下降。以前,我们提出了集成分解学习(EDL),其中仅在两个广义类上训练的增强集成组件提供了大量隐式参考类。领域专家知识(如复杂性度量)可以直接合并到EDL中,以减少对训练数据的依赖。然而,EDL和类似方法的成功需要各种复杂性度量,这些度量要足够灵活,以便在提供足够的数据(这些数据并不总是可用的)的情况下进行进一步调优。因此,添加不需要微调的补充措施是很重要的。持久同调(Persistent homology, PH)是一种计算拓扑工具,它提供了数据集的容噪拓扑摘要。PH对高维数据的直接应用通常是禁止的,并且需要特定于领域的降维。在此,我们建议基于复杂性度量而不是原始数据计算的PH值可以为增强稀有状态表示提供鲁棒的补充指标,如使用www.physionet.org数据的个性化医疗应用所示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信