{"title":"基于模糊粗糙c均值的无监督CNN聚类改进大规模图像数据:designbyhumans.com的实证研究","authors":"Anh Tuan Tran, B. Q. Tran, Kien Trung Luong","doi":"10.1145/3587828.3587829","DOIUrl":null,"url":null,"abstract":"Abstract: Clustering analysis, specifically for extensive image data, is increasingly being applied in various fields such as finance, risk management, prediction, etc., and has been a fascinating subject in many scientific discussions. Deep learning, a widely used approach, and classical methods address complex classification problems stemming from real-world cases. In this study, we took various approaches to classification problems and measured their effectiveness by combining different techniques using the results of different scenarios. Many approaches have been proposed to solve the clustering problem; complex clustering methods such as hierarchical, density-based, centroid-based, and graph theoretical have been submitted. However, when it comes to real-world applications, they exposed significant drawbacks when the dataset introduced immeasurable vagueness, uncertainty, or overlapping samples that made it impossible to predict and classify. Several attempts have been made to improve the clustering method's performance, including joint CNN clustering models. Still, many of them carry the cons of the complicated clustering method, which limits the capability of CNN. The combined CNN clustering method is designed to address the problem with those deterministic CNN clustering models and was evaluated on a dataset we collected from the website designbyhumans.com, with enough features to represent a non-synthetic dataset. This research aims to improve upon the established model by using estimation techniques in determining model parameters and graphing plots to justify those choices and give insights into how the model performs on a non-synthetic dataset like ours. We concluded that the model significantly improved compared with a popular complex clustering method, which has been evaluated by computational time, using different metrics to represent how better separated each cluster was. Based on conducted experiments and the future development of the method, we discussed and addressed some of the drawbacks of this approach.","PeriodicalId":340917,"journal":{"name":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improvement for Large-Scale Image Data using Fuzzy Rough C-Mean Based Unsupervised CNN Clustering: An Empirical Study on designbyhumans.com\",\"authors\":\"Anh Tuan Tran, B. Q. Tran, Kien Trung Luong\",\"doi\":\"10.1145/3587828.3587829\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract: Clustering analysis, specifically for extensive image data, is increasingly being applied in various fields such as finance, risk management, prediction, etc., and has been a fascinating subject in many scientific discussions. Deep learning, a widely used approach, and classical methods address complex classification problems stemming from real-world cases. In this study, we took various approaches to classification problems and measured their effectiveness by combining different techniques using the results of different scenarios. Many approaches have been proposed to solve the clustering problem; complex clustering methods such as hierarchical, density-based, centroid-based, and graph theoretical have been submitted. However, when it comes to real-world applications, they exposed significant drawbacks when the dataset introduced immeasurable vagueness, uncertainty, or overlapping samples that made it impossible to predict and classify. Several attempts have been made to improve the clustering method's performance, including joint CNN clustering models. Still, many of them carry the cons of the complicated clustering method, which limits the capability of CNN. The combined CNN clustering method is designed to address the problem with those deterministic CNN clustering models and was evaluated on a dataset we collected from the website designbyhumans.com, with enough features to represent a non-synthetic dataset. This research aims to improve upon the established model by using estimation techniques in determining model parameters and graphing plots to justify those choices and give insights into how the model performs on a non-synthetic dataset like ours. We concluded that the model significantly improved compared with a popular complex clustering method, which has been evaluated by computational time, using different metrics to represent how better separated each cluster was. Based on conducted experiments and the future development of the method, we discussed and addressed some of the drawbacks of this approach.\",\"PeriodicalId\":340917,\"journal\":{\"name\":\"Proceedings of the 2023 12th International Conference on Software and Computer Applications\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 12th International Conference on Software and Computer Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3587828.3587829\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 12th International Conference on Software and Computer Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3587828.3587829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improvement for Large-Scale Image Data using Fuzzy Rough C-Mean Based Unsupervised CNN Clustering: An Empirical Study on designbyhumans.com
Abstract: Clustering analysis, specifically for extensive image data, is increasingly being applied in various fields such as finance, risk management, prediction, etc., and has been a fascinating subject in many scientific discussions. Deep learning, a widely used approach, and classical methods address complex classification problems stemming from real-world cases. In this study, we took various approaches to classification problems and measured their effectiveness by combining different techniques using the results of different scenarios. Many approaches have been proposed to solve the clustering problem; complex clustering methods such as hierarchical, density-based, centroid-based, and graph theoretical have been submitted. However, when it comes to real-world applications, they exposed significant drawbacks when the dataset introduced immeasurable vagueness, uncertainty, or overlapping samples that made it impossible to predict and classify. Several attempts have been made to improve the clustering method's performance, including joint CNN clustering models. Still, many of them carry the cons of the complicated clustering method, which limits the capability of CNN. The combined CNN clustering method is designed to address the problem with those deterministic CNN clustering models and was evaluated on a dataset we collected from the website designbyhumans.com, with enough features to represent a non-synthetic dataset. This research aims to improve upon the established model by using estimation techniques in determining model parameters and graphing plots to justify those choices and give insights into how the model performs on a non-synthetic dataset like ours. We concluded that the model significantly improved compared with a popular complex clustering method, which has been evaluated by computational time, using different metrics to represent how better separated each cluster was. Based on conducted experiments and the future development of the method, we discussed and addressed some of the drawbacks of this approach.