Parallel DBSCAN Clustering Algorithm Using Hadoop Map-reduce Framework for Spatial Data

International Journal of Information Technology and Computer Science Pub Date : 2022-12-08 DOI:10.5815/ijitcs.2022.06.01

M. C., C. H

引用次数: 2

Abstract

Data clustering is the first step for future applications of big data analysis. It is a driving model for Artificial Intelligence and Machine Learning architectures. Processing large volumes of data in faster mode is a big challenge in these applications. which requires fast and efficient algorithms for handling big data. Parallel clustering algorithms are one promising design, which increases the speed of handling such big data. In this paper, a parallel algorithm for clustering a spatial dataset called the P-DBSCAN algorithm is implemented using Hadoop map-reduce framework. This research paper signifies the improvement for data clustering in data analytic applications. The new P-DBSCAN algorithm is executed over generated dataset. The result of this parallel algorithm is compared with existing DBSCAN algorithm to show improvement of runtime performance. This work offers an increase in the performance of execution time. In addition, the outcome of P-DBSCAN shows how to resolve the scalability problem of a large data set.

查看原文本刊更多论文

基于Hadoop Map-reduce框架的空间数据并行DBSCAN聚类算法

数据聚类是未来大数据分析应用的第一步。它是人工智能和机器学习架构的驱动模型。在这些应用程序中，以更快的模式处理大量数据是一个很大的挑战。这就需要快速高效的算法来处理大数据。并行聚类算法是一种很有前途的设计，它可以提高处理此类大数据的速度。本文使用Hadoop map-reduce框架实现了一种用于空间数据集聚类的并行算法P-DBSCAN算法。本研究对数据分析应用中数据聚类的改进具有重要意义。新的P-DBSCAN算法在生成的数据集上执行。将该并行算法与现有的DBSCAN算法进行了比较，结果表明该算法在运行时性能上有所提高。这项工作提高了执行时间的性能。此外，P-DBSCAN的结果显示了如何解决大型数据集的可伸缩性问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Information Technology and Computer Science

自引率

0.00%

发文量