Advances in Data Analysis and Classification最新文献

筛选
英文 中文
Contamination transformation matrix mixture modeling for skewed data groups with heavy tails and scatter 针对具有重尾和散点的倾斜数据组的污染变换矩阵混合建模
IF 1.4 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-09-13 DOI: 10.1007/s11634-023-00550-w
Xuwen Zhu, Yana Melnykov, Angelina S. Kolomoytseva
{"title":"Contamination transformation matrix mixture modeling for skewed data groups with heavy tails and scatter","authors":"Xuwen Zhu,&nbsp;Yana Melnykov,&nbsp;Angelina S. Kolomoytseva","doi":"10.1007/s11634-023-00550-w","DOIUrl":"10.1007/s11634-023-00550-w","url":null,"abstract":"<div><p>Model-based clustering is a popular application of the rapidly developing area of finite mixture modeling. While there is ample work focusing on clustering multivariate data, an increasing number of advancements have been aiming at the expansion of existing theory to the matrix-variate framework. Matrix-variate Gaussian mixtures are most popular in this setting despite the potential misfit for skewed and heavy-tailed data. To overcome this lack of flexibility, a new contaminated transformation matrix mixture model is proposed. We illustrate its utility in a series of experiments on simulated data and apply to a real-life data set containing COVID-related information. The performance of the developed model is promising in all considered settings.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 1","pages":"85 - 101"},"PeriodicalIF":1.4,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135741082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An analytic strategy for data processing of multimode networks 多模网络数据处理分析策略
IF 1.4 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-08-29 DOI: 10.1007/s11634-023-00556-4
Vincenzo Giuseppe Genova, Giuseppe Giordano, Giancarlo Ragozini, Maria Prosperina Vitale
{"title":"An analytic strategy for data processing of multimode networks","authors":"Vincenzo Giuseppe Genova,&nbsp;Giuseppe Giordano,&nbsp;Giancarlo Ragozini,&nbsp;Maria Prosperina Vitale","doi":"10.1007/s11634-023-00556-4","DOIUrl":"10.1007/s11634-023-00556-4","url":null,"abstract":"<div><p>Complex network data structures are considered to capture the richness of social phenomena and real-life data settings. Multipartite networks are an example in which various scenarios are represented by different types of relations, actors, or modes. Within this context, the present contribution aims at discussing an analytic strategy for simplifying multipartite networks in which different sets of nodes are linked. By considering the connection of multimode networks and hypergraphs as theoretical concepts, a three-step procedure is introduced to simplify, normalize, and filter network data structures. Thus, a model-based approach is introduced for derived bipartite weighted networks in order to extract statistically significant links. The usefulness of the strategy is demonstrated in handling two application fields, that is, intranational student mobility in higher education and research collaboration in European framework programs. Finally, both examples are explored using community detection algorithms to determine the presence of groups by mixing up different modes.\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 3","pages":"745 - 767"},"PeriodicalIF":1.4,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-023-00556-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82739517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust gradient boosting for generalized additive models for location, scale and shape 位置、尺度和形状广义加性模型的鲁棒梯度增强
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-08-26 DOI: 10.1007/s11634-023-00555-5
Jan Speller, C. Staerk, Francisco Gude, A. Mayr
{"title":"Robust gradient boosting for generalized additive models for location, scale and shape","authors":"Jan Speller, C. Staerk, Francisco Gude, A. Mayr","doi":"10.1007/s11634-023-00555-5","DOIUrl":"https://doi.org/10.1007/s11634-023-00555-5","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"24 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86992128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial for ADAC issue 3 of volume 17 (2023) ADAC第17卷第3期(2023年)社论
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-08-03 DOI: 10.1007/s11634-023-00554-6
Maurizio Vichi, Andrea Cerioli, Hans A. Kestler, Akinori Okada, Claus Weihs
{"title":"Editorial for ADAC issue 3 of volume 17 (2023)","authors":"Maurizio Vichi,&nbsp;Andrea Cerioli,&nbsp;Hans A. Kestler,&nbsp;Akinori Okada,&nbsp;Claus Weihs","doi":"10.1007/s11634-023-00554-6","DOIUrl":"10.1007/s11634-023-00554-6","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"545 - 548"},"PeriodicalIF":1.6,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50005726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the efficient implementation of classification rule learning 论分类规则学习的高效实施
IF 1.4 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-07-27 DOI: 10.1007/s11634-023-00553-7
Michael Rapp, Johannes Fürnkranz, Eyke Hüllermeier
{"title":"On the efficient implementation of classification rule learning","authors":"Michael Rapp,&nbsp;Johannes Fürnkranz,&nbsp;Eyke Hüllermeier","doi":"10.1007/s11634-023-00553-7","DOIUrl":"10.1007/s11634-023-00553-7","url":null,"abstract":"<div><p>Rule learning methods have a long history of active research in the machine learning community. They are not only a common choice in applications that demand human-interpretable classification models but have also been shown to achieve state-of-the-art performance when used in ensemble methods. Unfortunately, only little information can be found in the literature about the various implementation details that are crucial for the efficient induction of rule-based models. This work provides a detailed discussion of algorithmic concepts and approximations that enable applying rule learning techniques to large amounts of data. To demonstrate the advantages and limitations of these individual concepts in a series of experiments, we rely on BOOMER—a flexible and publicly available implementation for the efficient induction of gradient boosted single- or multi-label classification rules.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 4","pages":"851 - 892"},"PeriodicalIF":1.4,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-023-00553-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86421432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-based clustering using a new multivariate skew distribution 使用新的多元倾斜分布进行基于模型的聚类
IF 1.4 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-07-22 DOI: 10.1007/s11634-023-00552-8
Salvatore D. Tomarchio, Luca Bagnato, Antonio Punzo
{"title":"Model-based clustering using a new multivariate skew distribution","authors":"Salvatore D. Tomarchio,&nbsp;Luca Bagnato,&nbsp;Antonio Punzo","doi":"10.1007/s11634-023-00552-8","DOIUrl":"10.1007/s11634-023-00552-8","url":null,"abstract":"<div><p>Quite often real data exhibit non-normal features, such as asymmetry and heavy tails, and present a latent group structure. In this paper, we first propose the multivariate skew shifted exponential normal distribution that can account for these non-normal characteristics. Then, we use this distribution in a finite mixture modeling framework. An EM algorithm is illustrated for maximum-likelihood parameter estimation. We provide a simulation study that compares the fitting performance of our model with those of several alternative models. The comparison is also conducted on a real dataset concerning the log returns of four cryptocurrencies.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 1","pages":"61 - 83"},"PeriodicalIF":1.4,"publicationDate":"2023-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-023-00552-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80480027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A topological data analysis based classifier 基于拓扑数据分析的分类器
IF 1.4 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-07-01 DOI: 10.1007/s11634-023-00548-4
Rolando Kindelan, José Frías, Mauricio Cerda, Nancy Hitschfeld
{"title":"A topological data analysis based classifier","authors":"Rolando Kindelan,&nbsp;José Frías,&nbsp;Mauricio Cerda,&nbsp;Nancy Hitschfeld","doi":"10.1007/s11634-023-00548-4","DOIUrl":"10.1007/s11634-023-00548-4","url":null,"abstract":"<div><p>Topological Data Analysis (TDA) is an emerging field that aims to discover a dataset’s underlying topological information. TDA tools have been commonly used to create filters and topological descriptors to improve Machine Learning (ML) methods. This paper proposes a different TDA pipeline to classify balanced and imbalanced multi-class datasets without additional ML methods. Our proposed method was designed to solve multi-class and imbalanced classification problems with no data resampling preprocessing stage. The proposed TDA-based classifier (TDABC) builds a filtered simplicial complex on the dataset representing high-order data relationships. Following the assumption that a meaningful sub-complex exists in the filtration that approximates the data topology, we apply Persistent Homology (PH) to guide the selection of that sub-complex by considering detected topological features. We use each unlabeled point’s link and star operators to provide different-sized and multi-dimensional neighborhoods to propagate labels from labeled to unlabeled points. The labeling function depends on the filtration’s entire history of the filtered simplicial complex and it is encoded within the persistence diagrams at various dimensions. We select eight datasets with different dimensions, degrees of class overlap, and imbalanced samples per class to validate our method. The TDABC outperforms all baseline methods classifying multi-class imbalanced data with high imbalanced ratios and data with overlapped classes. Also, on average, the proposed method was better than K Nearest Neighbors (KNN) and weighted KNN and behaved competitively with Support Vector Machine and Random Forest baseline classifiers in balanced datasets.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 2","pages":"493 - 538"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87127200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A link function specification test in the single functional index model 单函数索引模型中的链接函数规范测试
IF 1.6 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-06-22 DOI: 10.1007/s11634-023-00545-7
Lax Chan, L. Delsol, A. Goia
{"title":"A link function specification test in the single functional index model","authors":"Lax Chan, L. Delsol, A. Goia","doi":"10.1007/s11634-023-00545-7","DOIUrl":"https://doi.org/10.1007/s11634-023-00545-7","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"68 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73214787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MLE for the parameters of bivariate interval-valued model 双变量区间值模型参数的 MLE
IF 1.4 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-06-18 DOI: 10.1007/s11634-023-00546-6
S. Yaser Samadi, L. Billard, Jiin-Huarng Guo, Wei Xu
{"title":"MLE for the parameters of bivariate interval-valued model","authors":"S. Yaser Samadi,&nbsp;L. Billard,&nbsp;Jiin-Huarng Guo,&nbsp;Wei Xu","doi":"10.1007/s11634-023-00546-6","DOIUrl":"10.1007/s11634-023-00546-6","url":null,"abstract":"<div><p>With contemporary data sets becoming too large to analyze the data directly, various forms of aggregated data are becoming common. The original individual data are points, but after aggregation the observations are interval-valued (e.g.). While some researchers simply analyze the set of averages of the observations by aggregated class, it is easily established that approach ignores much of the information in the original data set. The initial theoretical work for interval-valued data was that of Le-Rademacher and Billard (J Stat Plan Infer 141:1593–1602, 2011), but those results were limited to estimation of the mean and variance of a single variable only. This article seeks to redress the limitation of their work by deriving the maximum likelihood estimator for the all important covariance statistic, a basic requirement for numerous methodologies, such as regression, principal components, and canonical analyses. Asymptotic properties of the proposed estimators are established. The Le-Rademacher and Billard results emerge as special cases of our wider derivations.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 4","pages":"827 - 850"},"PeriodicalIF":1.4,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73829247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multivariate count time series segmentation with “sums and shares” and Poisson lognormal mixture models: a comparative study using pedestrian flows within a multimodal transport hub 使用 "总和与份额 "和泊松对数正态混合模型进行多变量计数时间序列分割:利用多式联运枢纽内的人流进行比较研究
IF 1.4 4区 计算机科学
Advances in Data Analysis and Classification Pub Date : 2023-05-29 DOI: 10.1007/s11634-023-00543-9
Paul de Nailly, Etienne Côme, Latifa Oukhellou, Allou Samé, Jacques Ferriere, Yasmine Merad-Boudia
{"title":"Multivariate count time series segmentation with “sums and shares” and Poisson lognormal mixture models: a comparative study using pedestrian flows within a multimodal transport hub","authors":"Paul de Nailly,&nbsp;Etienne Côme,&nbsp;Latifa Oukhellou,&nbsp;Allou Samé,&nbsp;Jacques Ferriere,&nbsp;Yasmine Merad-Boudia","doi":"10.1007/s11634-023-00543-9","DOIUrl":"10.1007/s11634-023-00543-9","url":null,"abstract":"<div><p>This paper deals with a clustering approach based on mixture models to analyze multidimensional mobility count time-series data within a multimodal transport hub. These time series are very likely to evolve depending on various periods characterized by strikes, maintenance works, or health measures against the Covid19 pandemic. In addition, exogenous one-off factors, such as concerts and transport disruptions, can also impact mobility. Our approach flexibly detects time segments within which the very noisy count data is synthesized into regular spatio-temporal mobility profiles. At the upper level of the modeling, evolving mixing weights are designed to detect segments properly. At the lower level, segment-specific count regression models take into account correlations between series and overdispersion as well as the impact of exogenous factors. For this purpose, we set up and compare two promising strategies that can address this issue, namely the “sums and shares” and “Poisson log-normal” models. The proposed methodologies are applied to actual data collected within a multimodal transport hub in the Paris region. Ticketing logs and pedestrian counts provided by stereo cameras are considered here. Experiments are carried out to show the ability of the statistical models to highlight mobility patterns within the transport hub. One model is chosen based on its ability to detect the most continuous segments possible while fitting the count time series well. An in-depth analysis of the time segmentation, mobility patterns, and impact of exogenous factors obtained with the chosen model is finally performed.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 2","pages":"455 - 491"},"PeriodicalIF":1.4,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83868644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信