Advances in Data Analysis and Classification最新文献_第6页

Applications of dual regularized Laplacian matrix for community detection 双正则化拉普拉斯矩阵在群落检测中的应用

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-10-26 DOI: 10.1007/s11634-023-00565-3

Huan Qing, Jingli Wang

{"title":"Applications of dual regularized Laplacian matrix for community detection","authors":"Huan Qing, Jingli Wang","doi":"10.1007/s11634-023-00565-3","DOIUrl":"10.1007/s11634-023-00565-3","url":null,"abstract":"<div><p>Spectral clustering is widely used for detecting clusters in networks for community detection, while a small change on the graph Laplacian matrix could bring a dramatic improvement. In this paper, we propose a dual regularized graph Laplacian matrix and then employ it to the classical spectral clustering approach under the degree-corrected stochastic block model. If the number of communities is known as <i>K</i>, we consider more than <i>K</i> leading eigenvectors and weight them by their corresponding eigenvalues in the spectral clustering procedure to improve the performance. The improved spectral clustering method is dual regularized spectral clustering (DRSC). Theoretical analysis of DRSC shows that under mild conditions it yields stable consistent community detection. Meanwhile, we develop a strategy by taking advantage of DRSC and Newman’s modularity to estimate the number of communities <i>K</i>. We compare the performance of DRSC with several spectral methods and investigate the behaviors of our strategy for estimating <i>K</i> by substantial simulated networks and real-world networks. Numerical results show that DRSC enjoys satisfactory performance and our strategy on estimating <i>K</i> performs accurately and consistently, even in cases where there is only one community in a network.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 4","pages":"1001 - 1043"},"PeriodicalIF":1.4,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134909473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new model for counterfactual analysis for functional data 功能数据反事实分析的新模型

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-10-25 DOI: 10.1007/s11634-023-00563-5

Emilio Carrizosa, Jasone Ramírez-Ayerbe, Dolores Romero Morales

引用次数: 0

Profile-based latent class distance association analyses for sparse tables:application to the attitude of European citizens towards sustainable tourism 针对稀疏表格的基于特征的潜类距离关联分析：应用于欧洲公民对可持续旅游业的态度

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-10-18 DOI: 10.1007/s11634-023-00559-1

Francesca Bassi, José Fernando Vera, Juan Antonio Marmolejo Martín

{"title":"Profile-based latent class distance association analyses for sparse tables:application to the attitude of European citizens towards sustainable tourism","authors":"Francesca Bassi, José Fernando Vera, Juan Antonio Marmolejo Martín","doi":"10.1007/s11634-023-00559-1","DOIUrl":"10.1007/s11634-023-00559-1","url":null,"abstract":"<div><p>Social and behavioural sciences often deal with the analysis of associations for cross-classified data. This paper focuses on the study of the patterns observed on European citizens regarding their attitude towards sustainable tourism, specifically their willingness to change travel and tourism habits to be more sustainable. The data collected the intention to comply with nine sustainable actions; answers to these questions generated individual profiles; moreover, European country belonging is reported. Therefore, unlike a variable-oriented approach, here we are interested in a person-oriented approach through profiles. Some traditional methods are limited in their performance when using profiles, for example, by sparseness of the contingency table. We removed many of these limitations by using a latent class distance association model, clustering the row profiles into classes and representing these together with the categories of the response variable in a low-dimensional space. We showed, furthermore, that an easy interpretation of associations between clusters’ centres and categories of a response variable can be incorporated in this framework in an intuitive way using unfolding. Results of the analyses outlined that citizens mostly committed to an environmentally friendly behavior live in Sweden and Romania; citizens less willing to change their habits towards a more sustainable behavior live in Belgium, Cyprus, France, Lithuania and the Netherlands. Citizens preparedness to change habits however depends also on their socio-demographic characteristics such as gender, age, occupation, type of community where living, household size, and the frequency of travelling before the Covid-19 pandemic.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 4","pages":"953 - 980"},"PeriodicalIF":1.4,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-023-00559-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135884070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial for ADAC issue 4 of volume 17 (2023) ADAC第17卷第4期(2023年)社论

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-10-14 DOI: 10.1007/s11634-023-00564-4

Maurizio Vichi, Andrea Cerioli, Hans A. Kestler, Akinori Okada, Claus Weihs

引用次数: 0

Discovering interpretable structure in longitudinal predictors via coefficient trees 通过系数树发现纵向预测因子中的可解释结构

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-10-11 DOI: 10.1007/s11634-023-00562-6

Özge Sürer, Daniel W. Apley, Edward C. Malthouse

{"title":"Discovering interpretable structure in longitudinal predictors via coefficient trees","authors":"Özge Sürer, Daniel W. Apley, Edward C. Malthouse","doi":"10.1007/s11634-023-00562-6","DOIUrl":"10.1007/s11634-023-00562-6","url":null,"abstract":"<div><p>We consider the regression setting in which the response variable is not longitudinal (i.e., it is observed once for each case), but it is assumed to depend functionally on a set of predictors that are observed longitudinally, which is a specific form of functional predictors. In this situation, we often expect that the same predictor observed at nearby time points are more likely to be associated with the response in the same way. In such situations, we can exploit those aspects and discover groups of predictors that share the same (or similar) coefficient according to their temporal proximity. We propose a new algorithm called coefficient tree regression for data in which the non-longitudinal response depends on longitudinal predictors to efficiently discover the underlying temporal characteristics of the data. The approach results in a simple and highly interpretable tree structure from which the hierarchical relationships between groups of predictors that affect the response in a similar manner based on their temporal proximity can be observed, and we demonstrate with a real example that it can provide a clear and concise interpretation of the data. In numerical comparisons over a variety of examples, we show that our approach achieves substantially better predictive accuracy than existing competitors, most likely due to its inherent form of dimensionality reduction that is automatically discovered when fitting the model, in addition to having interpretability advantages and lower computational expense.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 4","pages":"911 - 951"},"PeriodicalIF":1.4,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136209673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generalized Cramér’s coefficient via f-divergence for contingency tables 通过或然率表的 f-发散计算广义克拉梅尔系数

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-10-05 DOI: 10.1007/s11634-023-00560-8

Wataru Urasaki, Tomoyuki Nakagawa, Tomotaka Momozaki, Sadao Tomizawa

{"title":"Generalized Cramér’s coefficient via f-divergence for contingency tables","authors":"Wataru Urasaki, Tomoyuki Nakagawa, Tomotaka Momozaki, Sadao Tomizawa","doi":"10.1007/s11634-023-00560-8","DOIUrl":"10.1007/s11634-023-00560-8","url":null,"abstract":"<div><p>Various measures in two-way contingency table analysis have been proposed to express the strength of association between row and column variables in contingency tables. Tomizawa et al. (2004) proposed more general measures, including Cramér’s coefficient, using the power-divergence. In this paper, we propose measures using the <i>f</i>-divergence that has a wider class than the power-divergence. Unlike statistical hypothesis tests, these measures provide quantification of the association structure in contingency tables. The contribution of our study is proving that a measure applying a function that satisfies the condition of the <i>f</i>-divergence has desirable properties for measuring the strength of association in contingency tables. With this contribution, we can easily construct a new measure using a divergence that has essential properties for the analyst. For example, we conducted numerical experiments with a measure applying the <span>(theta)</span>-divergence. Furthermore, we can give further interpretation of the association between the row and column variables in the contingency table, which could not be obtained with the conventional one. We also show a relationship between our proposed measures and the correlation coefficient in a bivariate normal distribution of latent variables in the contingency tables.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 4","pages":"893 - 910"},"PeriodicalIF":1.4,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-023-00560-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135481525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mixture modeling with normalizing flows for spherical density estimation 用于球形密度估计的带有归一化流量的混合物建模

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-10-04 DOI: 10.1007/s11634-023-00561-7

Tin Lok James Ng, Andrew Zammit-Mangion

{"title":"Mixture modeling with normalizing flows for spherical density estimation","authors":"Tin Lok James Ng, Andrew Zammit-Mangion","doi":"10.1007/s11634-023-00561-7","DOIUrl":"10.1007/s11634-023-00561-7","url":null,"abstract":"<div><p>Normalizing flows are objects used for modeling complicated probability density functions, and have attracted considerable interest in recent years. Many flexible families of normalizing flows have been developed. However, the focus to date has largely been on normalizing flows on Euclidean domains; while normalizing flows have been developed for spherical and other non-Euclidean domains, these are generally less flexible than their Euclidean counterparts. To address this shortcoming, in this work we introduce a mixture-of-normalizing-flows model to construct complicated probability density functions on the sphere. This model provides a flexible alternative to existing parametric, semiparametric, and nonparametric, finite mixture models. Model estimation is performed using the expectation maximization algorithm and a variant thereof. The model is applied to simulated data, where the benefit over the conventional (single component) normalizing flow is verified. The model is then applied to two real-world data sets of events occurring on the surface of Earth; the first relating to earthquakes, and the second to terrorist activity. In both cases, we see that the mixture-of-normalizing-flows model yields a good representation of the density of event occurrence.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 1","pages":"103 - 120"},"PeriodicalIF":1.4,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135548086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parsimony and parameter estimation for mixtures of multivariate leptokurtic-normal distributions 多元畸变正态分布混合物的解析和参数估计

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-09-27 DOI: 10.1007/s11634-023-00558-2

Ryan P. Browne, Luca Bagnato, Antonio Punzo

引用次数: 0

Theory of angular depth for classification of directional data 用于定向数据分类的角深度理论

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-09-23 DOI: 10.1007/s11634-023-00557-3

Stanislav Nagy, Houyem Demni, Davide Buttarazzi, Giovanni C. Porzio

引用次数: 0

Co-clustering contaminated data: a robust model-based approach 对污染数据进行共聚类分析：基于模型的稳健方法

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-09-22 DOI: 10.1007/s11634-023-00549-3

Edoardo Fibbi, Domenico Perrotta, Francesca Torti, Stefan Van Aelst, Tim Verdonck

引用次数: 0