{"title":"使用数据挖掘聚类技术处理结构化数据","authors":"Seema Maitrey, C. K. Jha","doi":"10.1109/ICICT46931.2019.8977647","DOIUrl":null,"url":null,"abstract":"In the new era, every organization has the capability to store the extremely large amount of data. The continuous rise in the capturing of data is turning it into a huge tomb of data. Such huge data is becoming difficult to get analysed. This constantly growing large data set is making the challenge to the researchers in discovering knowledge from it. Valuable information is buried under the huge collection of data which can be extracted by making the use of Data Mining technique, as it possess the ability to dig out the embedded precious information from the large datasets. Various application areas required this technique, thus, resulted into an evolution of many data mining methods. Though several data mining methods get evolved not all of them were capable to deal with high voluminous data. Numerous computation and data- intensive scientific data analyses are established to compete with the ongoing time. As today’s data has got converted to Big data, it now require large-scale data mining analyses to fulfil its scalability and performance requirements. To serve such data, several efficient parallel and concurrent algorithms got applied. The parallel algorithms used different parallelization techniques to manage the huge voluminous data and brought them into real action. Formerly, these techniques were : threads, MPI etc. which produce different performance and usability characteristics. The MPI model was efficient in computing rigorous problems but difficult to bring them into the practical use. Over coming years, Data mining is continuously spreading its root in business and in learning organizations. The new integrated clustering algorithm called CURE became more vigorous to outliers and recognizes those clusters that were having irregular shapes and are of variant size. CURE is formed with the combined features of random sampling and partitioning which assured that the quality of output clusters produced by it is much improved with respect to those clusters that are resulted from the prior algorithms. This paper put focus on CURE clustering technique which found suitable for working with large databases.","PeriodicalId":412668,"journal":{"name":"2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Handling Structured Data Using Data Mining Clustering Techniques\",\"authors\":\"Seema Maitrey, C. K. Jha\",\"doi\":\"10.1109/ICICT46931.2019.8977647\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the new era, every organization has the capability to store the extremely large amount of data. The continuous rise in the capturing of data is turning it into a huge tomb of data. Such huge data is becoming difficult to get analysed. This constantly growing large data set is making the challenge to the researchers in discovering knowledge from it. Valuable information is buried under the huge collection of data which can be extracted by making the use of Data Mining technique, as it possess the ability to dig out the embedded precious information from the large datasets. Various application areas required this technique, thus, resulted into an evolution of many data mining methods. Though several data mining methods get evolved not all of them were capable to deal with high voluminous data. Numerous computation and data- intensive scientific data analyses are established to compete with the ongoing time. As today’s data has got converted to Big data, it now require large-scale data mining analyses to fulfil its scalability and performance requirements. To serve such data, several efficient parallel and concurrent algorithms got applied. The parallel algorithms used different parallelization techniques to manage the huge voluminous data and brought them into real action. Formerly, these techniques were : threads, MPI etc. which produce different performance and usability characteristics. The MPI model was efficient in computing rigorous problems but difficult to bring them into the practical use. Over coming years, Data mining is continuously spreading its root in business and in learning organizations. The new integrated clustering algorithm called CURE became more vigorous to outliers and recognizes those clusters that were having irregular shapes and are of variant size. CURE is formed with the combined features of random sampling and partitioning which assured that the quality of output clusters produced by it is much improved with respect to those clusters that are resulted from the prior algorithms. This paper put focus on CURE clustering technique which found suitable for working with large databases.\",\"PeriodicalId\":412668,\"journal\":{\"name\":\"2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT46931.2019.8977647\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT46931.2019.8977647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Handling Structured Data Using Data Mining Clustering Techniques
In the new era, every organization has the capability to store the extremely large amount of data. The continuous rise in the capturing of data is turning it into a huge tomb of data. Such huge data is becoming difficult to get analysed. This constantly growing large data set is making the challenge to the researchers in discovering knowledge from it. Valuable information is buried under the huge collection of data which can be extracted by making the use of Data Mining technique, as it possess the ability to dig out the embedded precious information from the large datasets. Various application areas required this technique, thus, resulted into an evolution of many data mining methods. Though several data mining methods get evolved not all of them were capable to deal with high voluminous data. Numerous computation and data- intensive scientific data analyses are established to compete with the ongoing time. As today’s data has got converted to Big data, it now require large-scale data mining analyses to fulfil its scalability and performance requirements. To serve such data, several efficient parallel and concurrent algorithms got applied. The parallel algorithms used different parallelization techniques to manage the huge voluminous data and brought them into real action. Formerly, these techniques were : threads, MPI etc. which produce different performance and usability characteristics. The MPI model was efficient in computing rigorous problems but difficult to bring them into the practical use. Over coming years, Data mining is continuously spreading its root in business and in learning organizations. The new integrated clustering algorithm called CURE became more vigorous to outliers and recognizes those clusters that were having irregular shapes and are of variant size. CURE is formed with the combined features of random sampling and partitioning which assured that the quality of output clusters produced by it is much improved with respect to those clusters that are resulted from the prior algorithms. This paper put focus on CURE clustering technique which found suitable for working with large databases.