{"title":"无翅目:自动PARAFAC2张量分析","authors":"Ekta Gujral, E. Papalexakis","doi":"10.1109/ASONAM55673.2022.10068699","DOIUrl":null,"url":null,"abstract":"In data mining, PARAFAC2 is a powerful and a multi-layer tensor decomposition method that is ideally suited for unsupervised modeling of data which forms “irregular” tensors, e.g., patient's diagnostic profiles, where each patient's recovery timeline does not necessarily align with other patients. In real-world applications, where no ground truth is available, how can we automatically choose how many components to analyze? Although extremely trivial, finding the number of components is very hard. So far, under traditional settings, to determine a reasonable number of components, when using PARAFAC2 data, is to compute decomposition with a different number of components and then analyze the outcome manually. This is an inefficient and time-consuming path, first, due to large data volume and second, the human evaluation makes the selection biased. In this paper, we introduce Aptera, a novel automatic PARAFAC2 tensor mining that is based on locating the L-curve corner. The automation of the PARAFAC2 model quality assessment helps both novice and qualified researchers to conduct detailed and advanced analysis. We extensively evaluate Aptera 's performance on synthetic data, outperforming existing state-of-the-art methods on this very hard problem. Finally, we apply Aptera to a variety of real-world datasets and demonstrate its robustness, scalability, and estimation reliability.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Aptera: Automatic PARAFAC2 Tensor Analysis\",\"authors\":\"Ekta Gujral, E. Papalexakis\",\"doi\":\"10.1109/ASONAM55673.2022.10068699\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In data mining, PARAFAC2 is a powerful and a multi-layer tensor decomposition method that is ideally suited for unsupervised modeling of data which forms “irregular” tensors, e.g., patient's diagnostic profiles, where each patient's recovery timeline does not necessarily align with other patients. In real-world applications, where no ground truth is available, how can we automatically choose how many components to analyze? Although extremely trivial, finding the number of components is very hard. So far, under traditional settings, to determine a reasonable number of components, when using PARAFAC2 data, is to compute decomposition with a different number of components and then analyze the outcome manually. This is an inefficient and time-consuming path, first, due to large data volume and second, the human evaluation makes the selection biased. In this paper, we introduce Aptera, a novel automatic PARAFAC2 tensor mining that is based on locating the L-curve corner. The automation of the PARAFAC2 model quality assessment helps both novice and qualified researchers to conduct detailed and advanced analysis. We extensively evaluate Aptera 's performance on synthetic data, outperforming existing state-of-the-art methods on this very hard problem. Finally, we apply Aptera to a variety of real-world datasets and demonstrate its robustness, scalability, and estimation reliability.\",\"PeriodicalId\":423113,\"journal\":{\"name\":\"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"volume\":\"104 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASONAM55673.2022.10068699\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM55673.2022.10068699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In data mining, PARAFAC2 is a powerful and a multi-layer tensor decomposition method that is ideally suited for unsupervised modeling of data which forms “irregular” tensors, e.g., patient's diagnostic profiles, where each patient's recovery timeline does not necessarily align with other patients. In real-world applications, where no ground truth is available, how can we automatically choose how many components to analyze? Although extremely trivial, finding the number of components is very hard. So far, under traditional settings, to determine a reasonable number of components, when using PARAFAC2 data, is to compute decomposition with a different number of components and then analyze the outcome manually. This is an inefficient and time-consuming path, first, due to large data volume and second, the human evaluation makes the selection biased. In this paper, we introduce Aptera, a novel automatic PARAFAC2 tensor mining that is based on locating the L-curve corner. The automation of the PARAFAC2 model quality assessment helps both novice and qualified researchers to conduct detailed and advanced analysis. We extensively evaluate Aptera 's performance on synthetic data, outperforming existing state-of-the-art methods on this very hard problem. Finally, we apply Aptera to a variety of real-world datasets and demonstrate its robustness, scalability, and estimation reliability.