OCA: Overlapping Clustering Application: Unsupervised Approach for Data Analysis

A. E. Danganan, Ariel M. Sison, Ruji P. Medina
{"title":"OCA: Overlapping Clustering Application: Unsupervised Approach for Data Analysis","authors":"A. E. Danganan, Ariel M. Sison, Ruji P. Medina","doi":"10.1109/ICIIBMS.2018.8550020","DOIUrl":null,"url":null,"abstract":"In this paper, a new data analysis tool called Overlapping Clustering Application (OCA) was presented. It was developed to identify overlapping clusters and outliers in an unsupervised manner. Python programming language was used for the development of the OCA. One of the methods used is the k-means algorithm, because of its simplicity to solve known clustering issues. The study also considered the use of median and median absolute deviation, it is known to be one of the most robust measures that are easy to use with the presence of outliers. Maxdist (maximum distance of data objects allowed in a cluster) is another method, it is used to identify data objects assigned to multi-cluster. The main function of OCA are composed of three phases. The first phase is to segment data objects into cluster using the k-means algorithm. The second phase is the detection of the abnormal values(outliers) in the datasets using median and median absolute deviation. Finally, the last phase is the identification of overlapping clusters, it uses maxdist as a predictor of data objects that can belong to multiple clusters. Based on the experimental results, the developed OCA demonstrated its capability in terms of detecting the abnormal values (outliers) and identification of clusters with overlaps. Experiments revealed that OCA is very useful data analysis tool for data clustering, outlier detection analysis and the detection of overlapping clusters.","PeriodicalId":430326,"journal":{"name":"2018 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIIBMS.2018.8550020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In this paper, a new data analysis tool called Overlapping Clustering Application (OCA) was presented. It was developed to identify overlapping clusters and outliers in an unsupervised manner. Python programming language was used for the development of the OCA. One of the methods used is the k-means algorithm, because of its simplicity to solve known clustering issues. The study also considered the use of median and median absolute deviation, it is known to be one of the most robust measures that are easy to use with the presence of outliers. Maxdist (maximum distance of data objects allowed in a cluster) is another method, it is used to identify data objects assigned to multi-cluster. The main function of OCA are composed of three phases. The first phase is to segment data objects into cluster using the k-means algorithm. The second phase is the detection of the abnormal values(outliers) in the datasets using median and median absolute deviation. Finally, the last phase is the identification of overlapping clusters, it uses maxdist as a predictor of data objects that can belong to multiple clusters. Based on the experimental results, the developed OCA demonstrated its capability in terms of detecting the abnormal values (outliers) and identification of clusters with overlaps. Experiments revealed that OCA is very useful data analysis tool for data clustering, outlier detection analysis and the detection of overlapping clusters.
重叠聚类应用:数据分析的无监督方法
本文提出了一种新的数据分析工具重叠聚类应用(OCA)。它被开发用于以无监督的方式识别重叠集群和异常值。OCA的开发使用了Python编程语言。使用的方法之一是k-means算法,因为它可以简单地解决已知的聚类问题。该研究还考虑了中位数和中位数绝对偏差的使用,这是已知的最稳健的措施之一,易于使用与异常值的存在。Maxdist(集群中允许的数据对象的最大距离)是另一种方法,它用于标识分配给多个集群的数据对象。OCA的主要功能由三个阶段组成。第一阶段是使用k-means算法将数据对象分割成簇。第二阶段是使用中位数和中位数绝对偏差检测数据集中的异常值(异常值)。最后,最后一个阶段是重叠集群的识别,它使用maxdist作为可能属于多个集群的数据对象的预测器。实验结果表明,所开发的OCA在检测异常值(异常值)和识别重叠簇方面具有良好的性能。实验表明,OCA是一种非常有用的数据分析工具,可用于数据聚类、离群点检测分析和重叠聚类检测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信