Advances in Data Analysis and Classification最新文献_第8页

Editorial for ADAC issue 2 of volume 17 (2023) ADAC第17卷第2期编辑（2023）

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-05-12 DOI: 10.1007/s11634-023-00544-8

Maurizio Vichi, Andrea Cerioli, Hans A. Kestler, Akinori Okada, Claus Weihs

引用次数: 0

Model-based clustering of functional data via mixtures of t distributions 通过 t 分布混合物对功能数据进行基于模型的聚类

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-05-12 DOI: 10.1007/s11634-023-00542-w

Cristina Anton, Iain Smith

{"title":"Model-based clustering of functional data via mixtures of t distributions","authors":"Cristina Anton, Iain Smith","doi":"10.1007/s11634-023-00542-w","DOIUrl":"10.1007/s11634-023-00542-w","url":null,"abstract":"<div><p>We propose a procedure, called T-funHDDC, for clustering multivariate functional data with outliers which extends the functional high dimensional data clustering (funHDDC) method (Schmutz et al. in Comput Stat 35:1101–1131, 2020) by considering a mixture of multivariate <i>t</i> distributions. We define a family of latent mixture models following the approach used for the parsimonious models considered in funHDDC and also constraining or not the degrees of freedom of the multivariate <i>t</i> distributions to be equal across the mixture components. The parameters of these models are estimated using an expectation maximization algorithm. In addition to proposing the T-funHDDC method, we add a family of parsimonious models to C-funHDDC, which is an alternative method for clustering multivariate functional data with outliers based on a mixture of contaminated normal distributions (Amovin-Assagba et al. in Comput Stat Data Anal 174:107496, 2022). We compare T-funHDDC, C-funHDDC, and other existing methods on simulated functional data with outliers and for real-world data. T-funHDDC outperforms funHDDC when applied to functional data with outliers, and its good performance makes it an alternative to C-funHDDC. We also apply the T-funHDDC method to the analysis of traffic flow in Edmonton, Canada.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 3","pages":"563 - 595"},"PeriodicalIF":1.4,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81142509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finite mixture of hidden Markov models for tensor-variate time series data 张量变量时间序列数据的有限混合隐马尔科夫模型

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-04-29 DOI: 10.1007/s11634-023-00540-y

Abdullah Asilkalkan, Xuwen Zhu, Shuchismita Sarkar

引用次数: 0

Application of distance standard deviation in functional data analysis 距离标准差在功能数据分析中的应用

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-04-21 DOI: 10.1007/s11634-023-00538-6

Mirosław Krzyśko, Łukasz Smaga

引用次数: 0

An enhanced version of the SSA-HJ-biplot for time series with complex structure 针对具有复杂结构的时间序列的 SSA-HJ-iplot 增强版

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-04-18 DOI: 10.1007/s11634-023-00541-x

Alberto Silva, Adelaide Freitas

引用次数: 0

Composite likelihood methods for parsimonious model-based clustering of mixed-type data 基于模型对混合型数据进行解析聚类的复合似然法

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-04-09 DOI: 10.1007/s11634-023-00539-5

Monia Ranalli, Roberto Rocci

引用次数: 0

Identification of representative trees in random forests based on a new tree-based distance measure 基于新的基于树的距离测量法识别随机森林中的代表性树

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-03-16 DOI: 10.1007/s11634-023-00537-7

Björn-Hergen Laabs, Ana Westenberger, Inke R. König

{"title":"Identification of representative trees in random forests based on a new tree-based distance measure","authors":"Björn-Hergen Laabs, Ana Westenberger, Inke R. König","doi":"10.1007/s11634-023-00537-7","DOIUrl":"10.1007/s11634-023-00537-7","url":null,"abstract":"<div><p>In life sciences, random forests are often used to train predictive models. However, gaining any explanatory insight into the mechanics leading to a specific outcome is rather complex, which impedes the implementation of random forests into clinical practice. By simplifying a complex ensemble of decision trees to a single most representative tree, it is assumed to be possible to observe common tree structures, the importance of specific features and variable interactions. Thus, representative trees could also help to understand interactions between genetic variants. Intuitively, representative trees are those with the minimal distance to all other trees, which requires a proper definition of the distance between two trees. Thus, we developed a new tree-based distance measure, which incorporates more of the underlying tree structure than other metrics. We compared our new method with the existing metrics in an extensive simulation study and applied it to predict the age at onset based on a set of genetic risk factors in a clinical data set. In our simulation study we were able to show the advantages of our weighted splitting variable approach. Our real data application revealed that representative trees are not only able to replicate the results from a recent genome-wide association study, but also can give additional explanations of the genetic mechanisms. Finally, we implemented all compared distance measures in R and made them publicly available in the R package timbR (https://github.com/imbs-hl/timbR).</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 2","pages":"363 - 380"},"PeriodicalIF":1.4,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-023-00537-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135553965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Threshold-based Naïve Bayes classifier 基于阈值的奈夫贝叶斯分类器

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-03-14 DOI: 10.1007/s11634-023-00536-8

Maurizio Romano, Giulia Contu, Francesco Mola, Claudio Conversano

{"title":"Threshold-based Naïve Bayes classifier","authors":"Maurizio Romano, Giulia Contu, Francesco Mola, Claudio Conversano","doi":"10.1007/s11634-023-00536-8","DOIUrl":"10.1007/s11634-023-00536-8","url":null,"abstract":"<div><p>The Threshold-based Naïve Bayes (Tb-NB) classifier is introduced as a (simple) improved version of the original Naïve Bayes classifier. Tb-NB extracts the sentiment from a Natural Language text corpus and allows the user not only to predict how much a sentence is positive (negative) but also to quantify a sentiment with a numeric value. It is based on the estimation of a single threshold value that concurs to define a decision rule that classifies a text into a positive (negative) opinion based on its content. One of the main advantage deriving from Tb-NB is the possibility to utilize its results as the input of post-hoc analysis aimed at observing how the quality associated to the different dimensions of a product or a service or, in a mirrored fashion, the different dimensions of customer satisfaction evolve in time or change with respect to different locations. The effectiveness of Tb-NB is evaluated analyzing data concerning the tourism industry and, specifically, hotel guests’ reviews from all hotels located in the Sardinian region and available on Booking.com. Moreover, Tb-NB is compared with other popular classifiers used in sentiment analysis in terms of model accuracy, resistance to noise and computational efficiency.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 2","pages":"325 - 361"},"PeriodicalIF":1.4,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-023-00536-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83512919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial for ADAC issue 1 of volume 17 (2023) ADAC第17卷第1期编辑（2023）

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-02-17 DOI: 10.1007/s11634-023-00535-9

Maurizio Vichi, Andrea Cerioli, Hans A. Kestler, Akinori Okada, Claus Weihs

引用次数: 0

Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components 采用半参数混合模型对具有不可忽略缺失的数据进行聚类

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2023-02-12 DOI: 10.1007/s11634-023-00534-w

Marie du Roy de Chaumaray, Matthieu Marbac

{"title":"Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components","authors":"Marie du Roy de Chaumaray, Matthieu Marbac","doi":"10.1007/s11634-023-00534-w","DOIUrl":"10.1007/s11634-023-00534-w","url":null,"abstract":"<div><p>We propose a semi-parametric clustering model assuming conditional independence given the component. One advantage is that this model can handle non-ignorable missingness. The model defines each component as a product of univariate probability distributions but makes no assumption on the form of each univariate density. Note that the mixture model is used for clustering but not for estimating the density of the full variables (observed and unobserved). Estimation is performed by maximizing an extension of the smoothed likelihood allowing missingness. This optimization is achieved by a Majorization-Minorization algorithm. We illustrate the relevance of our approach by numerical experiments conducted on simulated data. Under mild assumptions, we show the identifiability of the model defining the distribution of the observed data and the monotonicity of the algorithm. We also propose an extension of this new method to the case of mixed-type data that we illustrate on a real data set. The proposed method is implemented in the R package <span>MNARclust</span> available on CRAN.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"1081 - 1122"},"PeriodicalIF":1.6,"publicationDate":"2023-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50020807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0