{"title":"A model-based approach for clustering binned data","authors":"Asael Fabian Martínez, Carlos Díaz-Avalos","doi":"arxiv-2409.07738","DOIUrl":null,"url":null,"abstract":"Binned data often appears in different fields of research, and it is\ngenerated after summarizing the original data in a sequence of pairs of bins\n(or their midpoints) and frequencies. There may exist different reasons to only\nprovide this summary, but more importantly, it is necessary being able to\nperform statistical analyses based only on it. We present a Bayesian\nnonparametric model for clustering applicable for binned data. Clusters are\nmodeled via random partitions, and within them a model-based approach is\nassumed. Inferences are performed by a Markov chain Monte Carlo method and the\ncomplete proposal is tested using simulated and real data. Having particular\ninterest in studying marine populations, we analyze samples of Lobatus\n(Strobus) gigas' lengths and found the presence of up to three cohorts along\nthe year.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Binned data often appears in different fields of research, and it is
generated after summarizing the original data in a sequence of pairs of bins
(or their midpoints) and frequencies. There may exist different reasons to only
provide this summary, but more importantly, it is necessary being able to
perform statistical analyses based only on it. We present a Bayesian
nonparametric model for clustering applicable for binned data. Clusters are
modeled via random partitions, and within them a model-based approach is
assumed. Inferences are performed by a Markov chain Monte Carlo method and the
complete proposal is tested using simulated and real data. Having particular
interest in studying marine populations, we analyze samples of Lobatus
(Strobus) gigas' lengths and found the presence of up to three cohorts along
the year.