使用数据挖掘聚类技术处理结构化数据

Seema Maitrey, C. K. Jha
{"title":"使用数据挖掘聚类技术处理结构化数据","authors":"Seema Maitrey, C. K. Jha","doi":"10.1109/ICICT46931.2019.8977647","DOIUrl":null,"url":null,"abstract":"In the new era, every organization has the capability to store the extremely large amount of data. The continuous rise in the capturing of data is turning it into a huge tomb of data. Such huge data is becoming difficult to get analysed. This constantly growing large data set is making the challenge to the researchers in discovering knowledge from it. Valuable information is buried under the huge collection of data which can be extracted by making the use of Data Mining technique, as it possess the ability to dig out the embedded precious information from the large datasets. Various application areas required this technique, thus, resulted into an evolution of many data mining methods. Though several data mining methods get evolved not all of them were capable to deal with high voluminous data. Numerous computation and data- intensive scientific data analyses are established to compete with the ongoing time. As today’s data has got converted to Big data, it now require large-scale data mining analyses to fulfil its scalability and performance requirements. To serve such data, several efficient parallel and concurrent algorithms got applied. The parallel algorithms used different parallelization techniques to manage the huge voluminous data and brought them into real action. Formerly, these techniques were : threads, MPI etc. which produce different performance and usability characteristics. The MPI model was efficient in computing rigorous problems but difficult to bring them into the practical use. Over coming years, Data mining is continuously spreading its root in business and in learning organizations. The new integrated clustering algorithm called CURE became more vigorous to outliers and recognizes those clusters that were having irregular shapes and are of variant size. CURE is formed with the combined features of random sampling and partitioning which assured that the quality of output clusters produced by it is much improved with respect to those clusters that are resulted from the prior algorithms. This paper put focus on CURE clustering technique which found suitable for working with large databases.","PeriodicalId":412668,"journal":{"name":"2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Handling Structured Data Using Data Mining Clustering Techniques\",\"authors\":\"Seema Maitrey, C. K. Jha\",\"doi\":\"10.1109/ICICT46931.2019.8977647\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the new era, every organization has the capability to store the extremely large amount of data. The continuous rise in the capturing of data is turning it into a huge tomb of data. Such huge data is becoming difficult to get analysed. This constantly growing large data set is making the challenge to the researchers in discovering knowledge from it. Valuable information is buried under the huge collection of data which can be extracted by making the use of Data Mining technique, as it possess the ability to dig out the embedded precious information from the large datasets. Various application areas required this technique, thus, resulted into an evolution of many data mining methods. Though several data mining methods get evolved not all of them were capable to deal with high voluminous data. Numerous computation and data- intensive scientific data analyses are established to compete with the ongoing time. As today’s data has got converted to Big data, it now require large-scale data mining analyses to fulfil its scalability and performance requirements. To serve such data, several efficient parallel and concurrent algorithms got applied. The parallel algorithms used different parallelization techniques to manage the huge voluminous data and brought them into real action. Formerly, these techniques were : threads, MPI etc. which produce different performance and usability characteristics. The MPI model was efficient in computing rigorous problems but difficult to bring them into the practical use. Over coming years, Data mining is continuously spreading its root in business and in learning organizations. The new integrated clustering algorithm called CURE became more vigorous to outliers and recognizes those clusters that were having irregular shapes and are of variant size. CURE is formed with the combined features of random sampling and partitioning which assured that the quality of output clusters produced by it is much improved with respect to those clusters that are resulted from the prior algorithms. This paper put focus on CURE clustering technique which found suitable for working with large databases.\",\"PeriodicalId\":412668,\"journal\":{\"name\":\"2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT46931.2019.8977647\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT46931.2019.8977647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在新时代,每个组织都有能力存储海量数据。数据捕获量的持续增长正在把它变成一个巨大的数据坟墓。如此庞大的数据正变得难以分析。这种不断增长的大型数据集对研究人员从中发现知识提出了挑战。数据挖掘技术具有从庞大的数据集中挖掘出嵌入其中的宝贵信息的能力,因此有价值的信息被埋藏在海量的数据中。各种应用领域都需要这种技术,因此导致了许多数据挖掘方法的发展。虽然有几种数据挖掘方法得到了发展,但并不是所有的方法都能够处理大量的数据。建立了大量的计算和数据密集型的科学数据分析,以与不断发展的时间竞争。随着今天的数据已经转化为大数据,现在需要大规模的数据挖掘分析来满足其可扩展性和性能要求。为了服务这些数据,应用了几种高效的并行和并发算法。并行算法使用不同的并行化技术来管理海量数据,并将其付诸实际行动。以前,这些技术是:线程、MPI等,它们产生不同的性能和可用性特征。MPI模型在计算严格问题时是有效的,但难以应用于实际。在接下来的几年里,数据挖掘在商业和学习型组织中不断扎根。被称为CURE的新型综合聚类算法对异常值的识别能力更强,能够识别出形状不规则、大小不一的聚类。CURE结合了随机抽样和随机划分的特点,保证了它产生的输出簇的质量比以前的算法得到的簇有很大的提高。本文重点研究了适合大型数据库的CURE聚类技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Handling Structured Data Using Data Mining Clustering Techniques
In the new era, every organization has the capability to store the extremely large amount of data. The continuous rise in the capturing of data is turning it into a huge tomb of data. Such huge data is becoming difficult to get analysed. This constantly growing large data set is making the challenge to the researchers in discovering knowledge from it. Valuable information is buried under the huge collection of data which can be extracted by making the use of Data Mining technique, as it possess the ability to dig out the embedded precious information from the large datasets. Various application areas required this technique, thus, resulted into an evolution of many data mining methods. Though several data mining methods get evolved not all of them were capable to deal with high voluminous data. Numerous computation and data- intensive scientific data analyses are established to compete with the ongoing time. As today’s data has got converted to Big data, it now require large-scale data mining analyses to fulfil its scalability and performance requirements. To serve such data, several efficient parallel and concurrent algorithms got applied. The parallel algorithms used different parallelization techniques to manage the huge voluminous data and brought them into real action. Formerly, these techniques were : threads, MPI etc. which produce different performance and usability characteristics. The MPI model was efficient in computing rigorous problems but difficult to bring them into the practical use. Over coming years, Data mining is continuously spreading its root in business and in learning organizations. The new integrated clustering algorithm called CURE became more vigorous to outliers and recognizes those clusters that were having irregular shapes and are of variant size. CURE is formed with the combined features of random sampling and partitioning which assured that the quality of output clusters produced by it is much improved with respect to those clusters that are resulted from the prior algorithms. This paper put focus on CURE clustering technique which found suitable for working with large databases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信