{"title":"Estimating the Number of Clusters via Proportional Chinese Restaurant Process","authors":"Yingying Wen, Hangjin Jiang, Jianwei Yin","doi":"10.1145/3426826.3426840","DOIUrl":null,"url":null,"abstract":"Dirichlet Process Mixture (DPM) models tend to produce some major clusters along with many small clusters. These small confusing clusters are highly overlapped with major clusters. As the size of samples increasing without the change of sample distribution, the small unnecessary clusters would be introduced more and more in the cluster results. Recently, powered Chinese Restaurant Process (pCRP) is purposed to eliminate the counterfactual small clusters. However, it violates the usual and indispensable exchangeability assumption of DPM. In this paper, we propose a new method called proportional Chinese Restaurant Process (pro-CRP) that keeps the property of exchangeability while reduces the number of unnecessary small clusters. We show the experiment results on comparing pro-CRP with CRP and pCRP models and prove the number of clusters reduced by pro-CRP.","PeriodicalId":202857,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Machine Learning and Machine Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3426826.3426840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Dirichlet Process Mixture (DPM) models tend to produce some major clusters along with many small clusters. These small confusing clusters are highly overlapped with major clusters. As the size of samples increasing without the change of sample distribution, the small unnecessary clusters would be introduced more and more in the cluster results. Recently, powered Chinese Restaurant Process (pCRP) is purposed to eliminate the counterfactual small clusters. However, it violates the usual and indispensable exchangeability assumption of DPM. In this paper, we propose a new method called proportional Chinese Restaurant Process (pro-CRP) that keeps the property of exchangeability while reduces the number of unnecessary small clusters. We show the experiment results on comparing pro-CRP with CRP and pCRP models and prove the number of clusters reduced by pro-CRP.