Advances in Data Analysis and Classification最新文献_第10页

A structured covariance ensemble for sufficient dimension reduction 一种用于充分降维的结构化协方差系综

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-10-19 DOI: 10.1007/s11634-022-00524-4

Qin Wang, Yuan Xue

引用次数: 1

Semiparametric finite mixture of regression models with Bayesian P-splines 贝叶斯p样条半参数有限混合回归模型

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-10-18 DOI: 10.1007/s11634-022-00523-5

Marco Berrettini, Giuliano Galimberti, Saverio Ranciati

{"title":"Semiparametric finite mixture of regression models with Bayesian P-splines","authors":"Marco Berrettini, Giuliano Galimberti, Saverio Ranciati","doi":"10.1007/s11634-022-00523-5","DOIUrl":"10.1007/s11634-022-00523-5","url":null,"abstract":"<div><p>Mixture models provide a useful tool to account for unobserved heterogeneity and are at the basis of many model-based clustering methods. To gain additional flexibility, some model parameters can be expressed as functions of concomitant covariates. In this Paper, a semiparametric finite mixture of regression models is defined, with concomitant information assumed to influence both the component weights and the conditional means. In particular, linear predictors are replaced with smooth functions of the covariate considered by resorting to cubic splines. An estimation procedure within the Bayesian paradigm is suggested, where smoothness of the covariate effects is controlled by suitable choices for the prior distributions of the spline coefficients. A data augmentation scheme based on difference random utility models is exploited to describe the mixture weights as functions of the covariate. The performance of the proposed methodology is investigated via simulation experiments and two real-world datasets, one about baseball salaries and the other concerning nitrogen oxide in engine exhaust.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"745 - 775"},"PeriodicalIF":1.6,"publicationDate":"2022-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-022-00523-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50036456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

On smoothing and scaling language model for sentiment based information retrieval 基于情感信息检索的平滑和缩放语言模型

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-10-13 DOI: 10.1007/s11634-022-00522-6

Fatma Najar, Nizar Bouguila

引用次数: 1

The role of diversity and ensemble learning in credit card fraud detection 多样性和集合学习在信用卡欺诈检测中的作用。

IF 1.4 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-09-28 DOI: 10.1007/s11634-022-00515-5

Gian Marco Paldino, Bertrand Lebichot, Yann-Aël Le Borgne, Wissam Siblini, Frédéric Oblé, Giacomo Boracchi, Gianluca Bontempi

{"title":"The role of diversity and ensemble learning in credit card fraud detection","authors":"Gian Marco Paldino, Bertrand Lebichot, Yann-Aël Le Borgne, Wissam Siblini, Frédéric Oblé, Giacomo Boracchi, Gianluca Bontempi","doi":"10.1007/s11634-022-00515-5","DOIUrl":"10.1007/s11634-022-00515-5","url":null,"abstract":"<div><p>The number of daily credit card transactions is inexorably growing: the e-commerce market expansion and the recent constraints for the Covid-19 pandemic have significantly increased the use of electronic payments. The ability to precisely detect fraudulent transactions is increasingly important, and machine learning models are now a key component of the detection process. Standard machine learning techniques are widely employed, but inadequate for the evolving nature of customers behavior entailing continuous changes in the underlying data distribution. his problem is often tackled by discarding past knowledge, despite its potential relevance in the case of recurrent concepts. Appropriate exploitation of historical knowledge is necessary: we propose a learning strategy that relies on diversity-based ensemble learning and allows to preserve past concepts and reuse them for a faster adaptation to changes. In our experiments, we adopt several state-of-the-art diversity measures and we perform comparisons with various other learning approaches. We assess the effectiveness of our proposed learning strategy on extracts of two real datasets from two European countries, containing more than 30 M and 50 M transactions, provided by our industrial partner, Worldline, a leading company in the field.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"18 1","pages":"193 - 217"},"PeriodicalIF":1.4,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40392926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmarking distance-based partitioning methods for mixed-type data 基于基准距离的混合类型数据划分方法

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-09-22 DOI: 10.1007/s11634-022-00521-7

Efthymios Costa, Ioanna Papatsouma, Angelos Markos

引用次数: 3

New models for symbolic data analysis 符号数据分析的新模型

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-09-19 DOI: 10.1007/s11634-022-00520-8

Boris Beranger, Huan Lin, Scott Sisson

{"title":"New models for symbolic data analysis","authors":"Boris Beranger, Huan Lin, Scott Sisson","doi":"10.1007/s11634-022-00520-8","DOIUrl":"10.1007/s11634-022-00520-8","url":null,"abstract":"<div><p>Symbolic data analysis (SDA) is an emerging area of statistics concerned with understanding and modelling data that takes distributional form (i.e. <i>symbols</i>), such as random lists, intervals and histograms. It was developed under the premise that the statistical unit of interest is the symbol, and that inference is required at this level. Here we consider a different perspective, which opens a new research direction in the field of SDA. We assume that, as with a standard statistical analysis, inference is required at the level of individual-level data. However, the individual-level data are unobserved, and are aggregated into observed symbols—group-based distributional-valued summaries—prior to the analysis. We introduce a novel general method for constructing likelihood functions for symbolic data based on a desired probability model for the underlying measurement-level data, while only observing the distributional summaries. This approach opens the door for new classes of symbol design and construction, in addition to developing SDA as a viable tool to enable and improve upon classical data analyses, particularly for very large and complex datasets. We illustrate this new direction for SDA research through several real and simulated data analyses, including a study of novel classes of multivariate symbol construction techniques.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"659 - 699"},"PeriodicalIF":1.6,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-022-00520-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50038965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

Slice weighted average regression 切片加权平均回归

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-09-10 DOI: 10.1007/s11634-023-00551-9

Marina Masioti, Joshua J. Davies, Amanda Shaker, L. Prendergast

引用次数: 0

Robust regression for interval-valued data based on midpoints and log-ranges 基于中点和对数范围的区间值数据鲁棒回归

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-09-05 DOI: 10.1007/s11634-022-00518-2

Qing Zhao, Huiwen Wang, Shanshan Wang

{"title":"Robust regression for interval-valued data based on midpoints and log-ranges","authors":"Qing Zhao, Huiwen Wang, Shanshan Wang","doi":"10.1007/s11634-022-00518-2","DOIUrl":"10.1007/s11634-022-00518-2","url":null,"abstract":"<div><p>Flexible modelling of interval-valued data is of great practical importance with the development of advanced technologies in current data collection processes. This paper proposes a new robust regression model for interval-valued data based on midpoints and log-ranges of the dependent intervals, and obtains the parameter estimators using Huber loss function to deal with possible outliers in a data set. Besides, the use of logarithm transformation avoids the non-negativity constraints for the traditional modelling of ranges, which is beneficial to the flexible use of various regression methods. We conduct extensive Monte Carlo simulation experiments to compare the finite-sample performance of our model with that of the existing regression methods for interval-valued data. Results indicate that the proposed method shows competitive performance, especially in the data set with the existence of outliers and the scenarios where both midpoints and ranges of independent variables are related to those of the dependent one. Moreover, two empirical interval-valued data sets are applied to illustrate the effectiveness of our method.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"583 - 621"},"PeriodicalIF":1.6,"publicationDate":"2022-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-022-00518-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50010514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Band depth based initialization of K-means for functional data clustering 基于带深的函数数据聚类K-means初始化

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-09-03 DOI: 10.1007/s11634-022-00510-w

Javier Albert-Smet, Aurora Torrente, Juan Romo

{"title":"Band depth based initialization of K-means for functional data clustering","authors":"Javier Albert-Smet, Aurora Torrente, Juan Romo","doi":"10.1007/s11634-022-00510-w","DOIUrl":"10.1007/s11634-022-00510-w","url":null,"abstract":"<div><p>The <i>k</i>-Means algorithm is one of the most popular choices for clustering data but is well-known to be sensitive to the initialization process. There is a substantial number of methods that aim at finding optimal initial seeds for <i>k</i>-Means, though none of them is universally valid. This paper presents an extension to longitudinal data of one of such methods, the BRIk algorithm, that relies on clustering a set of centroids derived from bootstrap replicates of the data and on the use of the versatile Modified Band Depth. In our approach we improve the BRIk method by adding a step where we fit appropriate B-splines to our observations and a resampling process that allows computational feasibility and handling issues such as noise or missing data. We have derived two techniques for providing suitable initial seeds, each of them stressing respectively the multivariate or the functional nature of the data. Our results with simulated and real data sets indicate that our <i>F</i>unctional Data <i>A</i>pproach to the BRIK method (FABRIk) and our <i>F</i>unctional <i>D</i>ata <i>E</i>xtension of the BRIK method (FDEBRIk) are more effective than previous proposals at providing seeds to initialize <i>k</i>-Means in terms of clustering recovery.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 2","pages":"463 - 484"},"PeriodicalIF":1.6,"publicationDate":"2022-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-022-00510-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50447089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Nonparametric regression and classification with functional, categorical, and mixed covariates 具有函数、分类和混合协变量的非参数回归和分类

IF 1.6 4区计算机科学

Advances in Data Analysis and Classification Pub Date : 2022-09-02 DOI: 10.1007/s11634-022-00513-7

Leonie Selk, Jan Gertheiss

引用次数: 4