Hieu Nguyen, Meeta Kalra, Muhammad Azam, N. Bouguila
{"title":"有限尺度Dirichlet混合模型的在线变分学习数据聚类","authors":"Hieu Nguyen, Meeta Kalra, Muhammad Azam, N. Bouguila","doi":"10.1109/IRI.2019.00050","DOIUrl":null,"url":null,"abstract":"With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is obvious. Recent development of technology has made machine learning techniques applicable to various problems. In this paper, we emphasize on cluster analysis, an important aspect of data analysis. In other words, being able to automatically discover different groups containing similar data is crucial for further information retrieving and anomaly detection tasks. Thus, we propose an online variational inference framework for finite Scaled Dirichlet mixture models. By efficiently handling large scale data, online approach is capable of enhancing the scalability of finite mixture models for demanding applications in real time. The proposed method can simultaneously update the model's parameters and determine the optimal number of components without the complex computation of conventional Bayesian algorithm. The effectiveness of our model is affirmed with challenging problems including spam detection and image clustering.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Data Clustering Using Online Variational Learning of Finite Scaled Dirichlet Mixture Models\",\"authors\":\"Hieu Nguyen, Meeta Kalra, Muhammad Azam, N. Bouguila\",\"doi\":\"10.1109/IRI.2019.00050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is obvious. Recent development of technology has made machine learning techniques applicable to various problems. In this paper, we emphasize on cluster analysis, an important aspect of data analysis. In other words, being able to automatically discover different groups containing similar data is crucial for further information retrieving and anomaly detection tasks. Thus, we propose an online variational inference framework for finite Scaled Dirichlet mixture models. By efficiently handling large scale data, online approach is capable of enhancing the scalability of finite mixture models for demanding applications in real time. The proposed method can simultaneously update the model's parameters and determine the optimal number of components without the complex computation of conventional Bayesian algorithm. The effectiveness of our model is affirmed with challenging problems including spam detection and image clustering.\",\"PeriodicalId\":295028,\"journal\":{\"name\":\"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI.2019.00050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2019.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data Clustering Using Online Variational Learning of Finite Scaled Dirichlet Mixture Models
With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is obvious. Recent development of technology has made machine learning techniques applicable to various problems. In this paper, we emphasize on cluster analysis, an important aspect of data analysis. In other words, being able to automatically discover different groups containing similar data is crucial for further information retrieving and anomaly detection tasks. Thus, we propose an online variational inference framework for finite Scaled Dirichlet mixture models. By efficiently handling large scale data, online approach is capable of enhancing the scalability of finite mixture models for demanding applications in real time. The proposed method can simultaneously update the model's parameters and determine the optimal number of components without the complex computation of conventional Bayesian algorithm. The effectiveness of our model is affirmed with challenging problems including spam detection and image clustering.