无翅目:自动PARAFAC2张量分析

2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) Pub Date : 2022-11-10 DOI:10.1109/ASONAM55673.2022.10068699

Ekta Gujral, E. Papalexakis

{"title":"无翅目:自动PARAFAC2张量分析","authors":"Ekta Gujral, E. Papalexakis","doi":"10.1109/ASONAM55673.2022.10068699","DOIUrl":null,"url":null,"abstract":"In data mining, PARAFAC2 is a powerful and a multi-layer tensor decomposition method that is ideally suited for unsupervised modeling of data which forms “irregular” tensors, e.g., patient's diagnostic profiles, where each patient's recovery timeline does not necessarily align with other patients. In real-world applications, where no ground truth is available, how can we automatically choose how many components to analyze? Although extremely trivial, finding the number of components is very hard. So far, under traditional settings, to determine a reasonable number of components, when using PARAFAC2 data, is to compute decomposition with a different number of components and then analyze the outcome manually. This is an inefficient and time-consuming path, first, due to large data volume and second, the human evaluation makes the selection biased. In this paper, we introduce Aptera, a novel automatic PARAFAC2 tensor mining that is based on locating the L-curve corner. The automation of the PARAFAC2 model quality assessment helps both novice and qualified researchers to conduct detailed and advanced analysis. We extensively evaluate Aptera 's performance on synthetic data, outperforming existing state-of-the-art methods on this very hard problem. Finally, we apply Aptera to a variety of real-world datasets and demonstrate its robustness, scalability, and estimation reliability.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Aptera: Automatic PARAFAC2 Tensor Analysis\",\"authors\":\"Ekta Gujral, E. Papalexakis\",\"doi\":\"10.1109/ASONAM55673.2022.10068699\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In data mining, PARAFAC2 is a powerful and a multi-layer tensor decomposition method that is ideally suited for unsupervised modeling of data which forms “irregular” tensors, e.g., patient's diagnostic profiles, where each patient's recovery timeline does not necessarily align with other patients. In real-world applications, where no ground truth is available, how can we automatically choose how many components to analyze? Although extremely trivial, finding the number of components is very hard. So far, under traditional settings, to determine a reasonable number of components, when using PARAFAC2 data, is to compute decomposition with a different number of components and then analyze the outcome manually. This is an inefficient and time-consuming path, first, due to large data volume and second, the human evaluation makes the selection biased. In this paper, we introduce Aptera, a novel automatic PARAFAC2 tensor mining that is based on locating the L-curve corner. The automation of the PARAFAC2 model quality assessment helps both novice and qualified researchers to conduct detailed and advanced analysis. We extensively evaluate Aptera 's performance on synthetic data, outperforming existing state-of-the-art methods on this very hard problem. Finally, we apply Aptera to a variety of real-world datasets and demonstrate its robustness, scalability, and estimation reliability.\",\"PeriodicalId\":423113,\"journal\":{\"name\":\"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASONAM55673.2022.10068699\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM55673.2022.10068699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在数据挖掘中，PARAFAC2是一种功能强大的多层张量分解方法，非常适合于对形成“不规则”张量的数据进行无监督建模，例如，患者的诊断概况，其中每个患者的恢复时间不一定与其他患者一致。在现实世界的应用程序中，没有可获得的基础真理，我们如何自动选择要分析多少组件?虽然非常简单，但是找到组件的数量是非常困难的。到目前为止，在传统设置下，在使用PARAFAC2数据时，要确定合理的组件数量，是使用不同数量的组件计算分解，然后手动分析结果。这是一个低效且耗时的路径，首先，由于数据量大，其次，人工评估使选择有偏见。本文介绍了一种新的基于l曲线拐角定位的PARAFAC2张量自动挖掘算法Aptera。PARAFAC2模型质量评估的自动化有助于新手和合格的研究人员进行详细和高级的分析。我们广泛评估了Aptera在合成数据上的表现，在这个非常困难的问题上优于现有的最先进的方法。最后，我们将Aptera应用于各种现实世界的数据集，并展示了它的鲁棒性、可扩展性和估计可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Aptera: Automatic PARAFAC2 Tensor Analysis

In data mining, PARAFAC2 is a powerful and a multi-layer tensor decomposition method that is ideally suited for unsupervised modeling of data which forms “irregular” tensors, e.g., patient's diagnostic profiles, where each patient's recovery timeline does not necessarily align with other patients. In real-world applications, where no ground truth is available, how can we automatically choose how many components to analyze? Although extremely trivial, finding the number of components is very hard. So far, under traditional settings, to determine a reasonable number of components, when using PARAFAC2 data, is to compute decomposition with a different number of components and then analyze the outcome manually. This is an inefficient and time-consuming path, first, due to large data volume and second, the human evaluation makes the selection biased. In this paper, we introduce Aptera, a novel automatic PARAFAC2 tensor mining that is based on locating the L-curve corner. The automation of the PARAFAC2 model quality assessment helps both novice and qualified researchers to conduct detailed and advanced analysis. We extensively evaluate Aptera 's performance on synthetic data, outperforming existing state-of-the-art methods on this very hard problem. Finally, we apply Aptera to a variety of real-world datasets and demonstrate its robustness, scalability, and estimation reliability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

自引率

0.00%

发文量