使用云对增量数据进行基于hdfs的并行和可扩展模式挖掘

Int. J. Comput. Aided Eng. Technol. Pub Date : 2020-04-29 DOI:10.1504/ijcaet.2020.10029059

S. Sountharrajan, E. Suganya, N. Aravindhraj, S. Sankarananth, C. Rajan

{"title":"使用云对增量数据进行基于hdfs的并行和可扩展模式挖掘","authors":"S. Sountharrajan, E. Suganya, N. Aravindhraj, S. Sankarananth, C. Rajan","doi":"10.1504/ijcaet.2020.10029059","DOIUrl":null,"url":null,"abstract":"Increased usage of internet led to the migration of large amount of data to the cloud environment which uses Hadoop and map reduce framework for managing various mining applications in distributed environment. Earlier research activity in distributed mining comprises of solving complex problems using distributed computational techniques and new algorithmic designs. But as the nature of the data and user requirement becomes more complex and demanding, the existing distributed algorithms fails in multiple aspects. In our work, a new distributed frequent pattern algorithm, named Hadoop-based parallel frequent pattern mining (HPFP) has been proposed to optimally utilise the clusters efficiently and mine repeated patterns from large databases very effectively. The empirical evaluation shows that HPFP algorithm improves the performance of mining operation by increasing the level of parallelism and execution efficacy. HPFP achieves complete parallelism and delivers superior performance to become an efficient algorithm in HDFS, than existing distributed pattern mining algorithms.","PeriodicalId":346646,"journal":{"name":"Int. J. Comput. Aided Eng. Technol.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HDFS-based parallel and scalable pattern mining using clouds for incremental data\",\"authors\":\"S. Sountharrajan, E. Suganya, N. Aravindhraj, S. Sankarananth, C. Rajan\",\"doi\":\"10.1504/ijcaet.2020.10029059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increased usage of internet led to the migration of large amount of data to the cloud environment which uses Hadoop and map reduce framework for managing various mining applications in distributed environment. Earlier research activity in distributed mining comprises of solving complex problems using distributed computational techniques and new algorithmic designs. But as the nature of the data and user requirement becomes more complex and demanding, the existing distributed algorithms fails in multiple aspects. In our work, a new distributed frequent pattern algorithm, named Hadoop-based parallel frequent pattern mining (HPFP) has been proposed to optimally utilise the clusters efficiently and mine repeated patterns from large databases very effectively. The empirical evaluation shows that HPFP algorithm improves the performance of mining operation by increasing the level of parallelism and execution efficacy. HPFP achieves complete parallelism and delivers superior performance to become an efficient algorithm in HDFS, than existing distributed pattern mining algorithms.\",\"PeriodicalId\":346646,\"journal\":{\"name\":\"Int. J. Comput. Aided Eng. Technol.\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Aided Eng. Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijcaet.2020.10029059\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Aided Eng. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijcaet.2020.10029059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

互联网使用的增加导致大量数据迁移到云环境，使用Hadoop和map reduce框架来管理分布式环境中的各种挖掘应用程序。分布式采矿的早期研究活动包括使用分布式计算技术和新的算法设计来解决复杂问题。但是随着数据性质和用户需求的日益复杂和苛刻，现有的分布式算法在很多方面都出现了缺陷。在我们的工作中，提出了一种新的分布式频繁模式算法，即基于hadoop的并行频繁模式挖掘(HPFP)，以最优地利用集群，有效地从大型数据库中挖掘重复模式。实证评价表明，HPFP算法通过提高并行度和执行效率，提高了挖掘操作的性能。与现有的分布式模式挖掘算法相比，HPFP实现了完全并行性，并提供了优越的性能，成为HDFS中高效的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HDFS-based parallel and scalable pattern mining using clouds for incremental data

Increased usage of internet led to the migration of large amount of data to the cloud environment which uses Hadoop and map reduce framework for managing various mining applications in distributed environment. Earlier research activity in distributed mining comprises of solving complex problems using distributed computational techniques and new algorithmic designs. But as the nature of the data and user requirement becomes more complex and demanding, the existing distributed algorithms fails in multiple aspects. In our work, a new distributed frequent pattern algorithm, named Hadoop-based parallel frequent pattern mining (HPFP) has been proposed to optimally utilise the clusters efficiently and mine repeated patterns from large databases very effectively. The empirical evaluation shows that HPFP algorithm improves the performance of mining operation by increasing the level of parallelism and execution efficacy. HPFP achieves complete parallelism and delivers superior performance to become an efficient algorithm in HDFS, than existing distributed pattern mining algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Comput. Aided Eng. Technol.

自引率

0.00%

发文量