Improvement for Large-Scale Image Data using Fuzzy Rough C-Mean Based Unsupervised CNN Clustering: An Empirical Study on designbyhumans.com

Anh Tuan Tran, B. Q. Tran, Kien Trung Luong
{"title":"Improvement for Large-Scale Image Data using Fuzzy Rough C-Mean Based Unsupervised CNN Clustering: An Empirical Study on designbyhumans.com","authors":"Anh Tuan Tran, B. Q. Tran, Kien Trung Luong","doi":"10.1145/3587828.3587829","DOIUrl":null,"url":null,"abstract":"Abstract: Clustering analysis, specifically for extensive image data, is increasingly being applied in various fields such as finance, risk management, prediction, etc., and has been a fascinating subject in many scientific discussions. Deep learning, a widely used approach, and classical methods address complex classification problems stemming from real-world cases. In this study, we took various approaches to classification problems and measured their effectiveness by combining different techniques using the results of different scenarios. Many approaches have been proposed to solve the clustering problem; complex clustering methods such as hierarchical, density-based, centroid-based, and graph theoretical have been submitted. However, when it comes to real-world applications, they exposed significant drawbacks when the dataset introduced immeasurable vagueness, uncertainty, or overlapping samples that made it impossible to predict and classify. Several attempts have been made to improve the clustering method's performance, including joint CNN clustering models. Still, many of them carry the cons of the complicated clustering method, which limits the capability of CNN. The combined CNN clustering method is designed to address the problem with those deterministic CNN clustering models and was evaluated on a dataset we collected from the website designbyhumans.com, with enough features to represent a non-synthetic dataset. This research aims to improve upon the established model by using estimation techniques in determining model parameters and graphing plots to justify those choices and give insights into how the model performs on a non-synthetic dataset like ours. We concluded that the model significantly improved compared with a popular complex clustering method, which has been evaluated by computational time, using different metrics to represent how better separated each cluster was. Based on conducted experiments and the future development of the method, we discussed and addressed some of the drawbacks of this approach.","PeriodicalId":340917,"journal":{"name":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587828.3587829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract: Clustering analysis, specifically for extensive image data, is increasingly being applied in various fields such as finance, risk management, prediction, etc., and has been a fascinating subject in many scientific discussions. Deep learning, a widely used approach, and classical methods address complex classification problems stemming from real-world cases. In this study, we took various approaches to classification problems and measured their effectiveness by combining different techniques using the results of different scenarios. Many approaches have been proposed to solve the clustering problem; complex clustering methods such as hierarchical, density-based, centroid-based, and graph theoretical have been submitted. However, when it comes to real-world applications, they exposed significant drawbacks when the dataset introduced immeasurable vagueness, uncertainty, or overlapping samples that made it impossible to predict and classify. Several attempts have been made to improve the clustering method's performance, including joint CNN clustering models. Still, many of them carry the cons of the complicated clustering method, which limits the capability of CNN. The combined CNN clustering method is designed to address the problem with those deterministic CNN clustering models and was evaluated on a dataset we collected from the website designbyhumans.com, with enough features to represent a non-synthetic dataset. This research aims to improve upon the established model by using estimation techniques in determining model parameters and graphing plots to justify those choices and give insights into how the model performs on a non-synthetic dataset like ours. We concluded that the model significantly improved compared with a popular complex clustering method, which has been evaluated by computational time, using different metrics to represent how better separated each cluster was. Based on conducted experiments and the future development of the method, we discussed and addressed some of the drawbacks of this approach.
基于模糊粗糙c均值的无监督CNN聚类改进大规模图像数据:designbyhumans.com的实证研究
摘要:聚类分析,特别是针对大量图像数据的聚类分析,越来越多地应用于金融、风险管理、预测等各个领域,已经成为许多科学讨论的热门话题。深度学习是一种广泛使用的方法,经典方法解决了来自现实世界案例的复杂分类问题。在这项研究中,我们采用了各种方法来解决分类问题,并通过使用不同场景的结果结合不同的技术来衡量它们的有效性。人们提出了许多方法来解决聚类问题;复杂的聚类方法,如层次的,基于密度的,基于质心的,和图理论已经提交。然而,当涉及到现实世界的应用程序时,当数据集引入不可测量的模糊性、不确定性或重叠样本时,它们暴露了显著的缺点,这使得无法预测和分类。为了提高聚类方法的性能,已经进行了几种尝试,包括联合CNN聚类模型。然而,其中许多都带有复杂聚类方法的缺点,这限制了CNN的能力。联合CNN聚类方法旨在解决那些确定性CNN聚类模型的问题,并在我们从designbyhumans.com网站收集的数据集上进行了评估,该数据集具有足够的特征来表示非合成数据集。本研究旨在通过使用估计技术来确定模型参数和绘制图表来证明这些选择,并深入了解模型在像我们这样的非合成数据集上的表现,从而改进已建立的模型。我们得出的结论是,与一种流行的复杂聚类方法相比,该模型有了显著的改进,该方法已通过计算时间进行评估,使用不同的指标来表示每个聚类的分离程度。基于已进行的实验和该方法的未来发展,我们讨论并解决了该方法的一些缺点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信