{"title":"Editorial for ADAC issue 3 of volume 19 (2025)","authors":"Maurizio Vichi, Andrea Cerioli, Hans A. Kestler","doi":"10.1007/s11634-025-00652-7","DOIUrl":"10.1007/s11634-025-00652-7","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 3","pages":"545 - 549"},"PeriodicalIF":1.3,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145078993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlo Metta, Marco Fantozzi, Andrea Papini, Gianluca Amato, Matteo Bergamaschi, Andrea Fois, Silvia Giulia Galfrè, Alessandro Marchetti, Michelangelo Vegliò, Maurizio Parton, Francesco Morandin
{"title":"Increasing biases can be more efficient than increasing weights","authors":"Carlo Metta, Marco Fantozzi, Andrea Papini, Gianluca Amato, Matteo Bergamaschi, Andrea Fois, Silvia Giulia Galfrè, Alessandro Marchetti, Michelangelo Vegliò, Maurizio Parton, Francesco Morandin","doi":"10.1007/s11634-025-00649-2","DOIUrl":"10.1007/s11634-025-00649-2","url":null,"abstract":"<div><p>We introduce a novel computational unit for neural networks that features multiple biases, challenging the traditional perceptron structure. This unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next, applying activation functions later in the process with specialized biases for each unit. Through both empirical and theoretical analyses, we show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model’s performance. This approach offers an alternative perspective on optimizing information flow within neural networks. See source code (CurioSAI in Increasing biases can be more efficient than increasing weights, 2023. https://github.com/CuriosAI/dac-dev).\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"437 - 468"},"PeriodicalIF":1.3,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145166365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paolo Giordani, Christian Hennig, Julien Jacques, Carla Rampichini
{"title":"Special issue on “Advances in clustering, classification and related methods”","authors":"Paolo Giordani, Christian Hennig, Julien Jacques, Carla Rampichini","doi":"10.1007/s11634-025-00645-6","DOIUrl":"10.1007/s11634-025-00645-6","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"271 - 273"},"PeriodicalIF":1.3,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145170373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variational inference for estimating dynamic stochastic block models through an evolutionary algorithm","authors":"Luca Brusa, Fulvia Pennoni","doi":"10.1007/s11634-025-00634-9","DOIUrl":"10.1007/s11634-025-00634-9","url":null,"abstract":"<div><p>Dynamic temporal networks are important structures to capture node dependencies and their evolution over time. The dynamic stochastic block model, commonly used with longitudinal network data, is estimated maximizing the likelihood function through the variational expectation-maximization (VEM) algorithm. However, maximization is challenging due to the presence of multiple local maxima. In this paper, we first conduct a simulation study to assess the performance of six different parameter initialization strategies. Second, we introduce a novel specification of the VEM through a genetic algorithm, enabling a more comprehensive exploration of the parameter space. Results from both simulations and historical data on infectious disease transmission highlight the advantages of this approach in overcoming convergence to local maxima and improving node clustering in temporal network data.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"469 - 492"},"PeriodicalIF":1.3,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00634-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145168374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing flexible modelling approaches: the varying-thresholds model versus quantile regression","authors":"Niccolò Ducci, Leonardo Grilli, Marta Pittavino","doi":"10.1007/s11634-025-00635-8","DOIUrl":"10.1007/s11634-025-00635-8","url":null,"abstract":"<div><p>The varying-thresholds model (VTM) is a novel methodology proposed by Tutz ( Flexible predictive distributions from varying-thresholds modelling. https://doi.org/10.48550/arXiv.2103.13324, arXiv:2103.13324 2021) capable of estimating the whole conditional distribution of a response variable in a regression setting. It can be used for continuous, ordinal and count responses. In this study, conditional quantiles and prediction intervals estimated through VTM are compared with those of quantile regression. The comparison is based on a set of data-generating models to assess the performance of the two methodologies regarding the coverage and width of prediction intervals. The simulation study encompasses settings with several functional forms and types of errors. In addition, a discrete version of the continuous ranked probability score is proposed as a tool to choose the best link function for the binary models used in the fitting of VTM. In summary, the varying-thresholds model is a flexible methodology that can be broadly applied with light assumptions; it is advantageous over quantile regression when the conditional quantile function is misspecified.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"493 - 514"},"PeriodicalIF":1.3,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00635-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Giovanna Ranalli, Fulvia Pennoni, Francesco Bartolucci, Antonietta Mira
{"title":"When non-response makes estimates from a census a small area estimation problem: the case of the survey on graduates’ employment status in Italy","authors":"Maria Giovanna Ranalli, Fulvia Pennoni, Francesco Bartolucci, Antonietta Mira","doi":"10.1007/s11634-025-00630-z","DOIUrl":"10.1007/s11634-025-00630-z","url":null,"abstract":"<div><p>Since 1998, AlmaLaurea—a consortium of 80 Italian universities and a member of the Italian National Statistical System—has conducted an annual census on graduates’ employment status. The survey provides estimates of descriptive indicators at both the population level and for specific subpopulations (domains) of interest, such as degree programmes. Some domains have very few observations due to a small population size and non-response. In this paper, we address this estimation problem within a Small Area Estimation framework. Specifically, we propose using generalized linear mixed models that incorporate two variables as proxies for graduates’ response propensity, making the assumption of non-informative non-response more plausible. Degree programme estimates of employment rates are derived as (semi-parametric) empirical best predictions using a finite mixture of logistic regression models, with their mean squared error estimated via a second-order, bias-corrected, analytical estimator. Sensitivity analysis is conducted to assess the explanatory power of variables modelling response propensity and to evaluate potential correlations between area-specific random effects and observed heterogeneity.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"515 - 543"},"PeriodicalIF":1.3,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00630-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145163429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial for ADAC issue 1 of volume 19 (2025)","authors":"Maurizio Vichi, Andrea Cerioli, Hans A. Kestler","doi":"10.1007/s11634-025-00629-6","DOIUrl":"10.1007/s11634-025-00629-6","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 1","pages":"1 - 4"},"PeriodicalIF":1.4,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Random models for adjusting fuzzy rand index extensions","authors":"Ryan DeWolfe, Jeffrey L. Andrews","doi":"10.1007/s11634-025-00625-w","DOIUrl":"10.1007/s11634-025-00625-w","url":null,"abstract":"<div><p>The adjusted Rand index (ARI) is a widely used method for comparing hard clusterings, but requires a choice of random model that is often left implicit. Several recent works have extended the Rand index to fuzzy clusterings and adjusted for chance agreement with the permutation model, but the assumptions of this random model are difficult to justify for fuzzy clusterings. Previous work on random models for hard clusterings has shown that different random models can impact similarity rankings, so matching the assumptions of the random model to the algorithm is essential. We propose a single framework computing the ARI with three new random models that are intuitive and explainable for both hard and fuzzy clusterings. The theory and assumptions of the proposed models are contrasted with the existing permutation model, and computations on synthetic and benchmark data show that each model has distinct behaviour, meaning accurate model selection is important for the reliability of results.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"361 - 385"},"PeriodicalIF":1.3,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ganesh Babu, Aoife Gowen, Michael Fop, Isobel Claire Gormley
{"title":"A consensus-constrained parsimonious Gaussian mixture model for clustering hyperspectral images","authors":"Ganesh Babu, Aoife Gowen, Michael Fop, Isobel Claire Gormley","doi":"10.1007/s11634-025-00623-y","DOIUrl":"10.1007/s11634-025-00623-y","url":null,"abstract":"<div><p>The use of hyperspectral imaging to investigate food samples has grown due to the improved performance and lower cost of instrumentation. Food engineers use hyperspectral images to classify the type and quality of a food sample, typically using classification methods. In order to train these methods, every pixel in each training image needs to be labelled. Typically, computationally cheap threshold-based approaches are used to label the pixels, and classification methods are trained based on those labels. However, threshold-based approaches are subjective and cannot be generalized across hyperspectral images taken in different conditions and of different foods. Here a consensus-constrained parsimonious Gaussian mixture model (ccPGMM) is proposed to label pixels in hyperspectral images using a model-based clustering approach. The ccPGMM utilizes information that is available on some pixels and specifies constraints on those pixels belonging to the same or different clusters while clustering the rest of the pixels in the image. A latent variable model is used to represent the high-dimensional data in terms of a small number of underlying latent factors. To ensure computational feasibility, a consensus clustering approach is employed, where the data are divided into multiple randomly selected subsets of variables and constrained clustering is applied to each data subset; the clustering results are then consolidated across all data subsets to provide a consensus clustering solution. The ccPGMM approach is applied to simulated datasets and real hyperspectral images of three types of puffed cereal, corn, rice, and wheat. Improved clustering performance and computational efficiency are demonstrated when compared to other current state-of-the-art approaches.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"323 - 359"},"PeriodicalIF":1.3,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-025-00623-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145162774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Feo, Francesco Giordano, Sara Milito, Marcella Niglio, Maria Lucia Parrella
{"title":"Clustering and classification of spatio-temporal data using spatial dynamic panel data models","authors":"Giuseppe Feo, Francesco Giordano, Sara Milito, Marcella Niglio, Maria Lucia Parrella","doi":"10.1007/s11634-024-00620-7","DOIUrl":"10.1007/s11634-024-00620-7","url":null,"abstract":"<div><p>The class of <i>Spatial Dynamic Panel Data</i> models has been proposed in the socio-econometric literature to analyze spatio-temporal data. In this paper we consider a particular variant of such models, where the set of spatial units is assumed to be partitioned into clusters and the parameters of the model are assumed to be homogeneous within clusters and heterogeneous across clusters. For this model, assuming that the true partition is unknown, we propose a new clustering procedure and a validation test, based on a multiple testing approach, that help to choose the best configuration of model, for a given observed dataset, by estimating the optimal number of clusters and the best partition of units. The validity of the proposed procedures has been shown both theoretically and empirically, on simulated and real data, also compared to alternative methods.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 classification and related methods”","pages":"387 - 435"},"PeriodicalIF":1.3,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145167837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}