{"title":"OCA: Overlapping Clustering Application: Unsupervised Approach for Data Analysis","authors":"A. E. Danganan, Ariel M. Sison, Ruji P. Medina","doi":"10.1109/ICIIBMS.2018.8550020","DOIUrl":null,"url":null,"abstract":"In this paper, a new data analysis tool called Overlapping Clustering Application (OCA) was presented. It was developed to identify overlapping clusters and outliers in an unsupervised manner. Python programming language was used for the development of the OCA. One of the methods used is the k-means algorithm, because of its simplicity to solve known clustering issues. The study also considered the use of median and median absolute deviation, it is known to be one of the most robust measures that are easy to use with the presence of outliers. Maxdist (maximum distance of data objects allowed in a cluster) is another method, it is used to identify data objects assigned to multi-cluster. The main function of OCA are composed of three phases. The first phase is to segment data objects into cluster using the k-means algorithm. The second phase is the detection of the abnormal values(outliers) in the datasets using median and median absolute deviation. Finally, the last phase is the identification of overlapping clusters, it uses maxdist as a predictor of data objects that can belong to multiple clusters. Based on the experimental results, the developed OCA demonstrated its capability in terms of detecting the abnormal values (outliers) and identification of clusters with overlaps. Experiments revealed that OCA is very useful data analysis tool for data clustering, outlier detection analysis and the detection of overlapping clusters.","PeriodicalId":430326,"journal":{"name":"2018 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIBMS.2018.8550020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this paper, a new data analysis tool called Overlapping Clustering Application (OCA) was presented. It was developed to identify overlapping clusters and outliers in an unsupervised manner. Python programming language was used for the development of the OCA. One of the methods used is the k-means algorithm, because of its simplicity to solve known clustering issues. The study also considered the use of median and median absolute deviation, it is known to be one of the most robust measures that are easy to use with the presence of outliers. Maxdist (maximum distance of data objects allowed in a cluster) is another method, it is used to identify data objects assigned to multi-cluster. The main function of OCA are composed of three phases. The first phase is to segment data objects into cluster using the k-means algorithm. The second phase is the detection of the abnormal values(outliers) in the datasets using median and median absolute deviation. Finally, the last phase is the identification of overlapping clusters, it uses maxdist as a predictor of data objects that can belong to multiple clusters. Based on the experimental results, the developed OCA demonstrated its capability in terms of detecting the abnormal values (outliers) and identification of clusters with overlaps. Experiments revealed that OCA is very useful data analysis tool for data clustering, outlier detection analysis and the detection of overlapping clusters.