使用云对增量数据进行基于hdfs的并行和可扩展模式挖掘

S. Sountharrajan, E. Suganya, N. Aravindhraj, S. Sankarananth, C. Rajan
{"title":"使用云对增量数据进行基于hdfs的并行和可扩展模式挖掘","authors":"S. Sountharrajan, E. Suganya, N. Aravindhraj, S. Sankarananth, C. Rajan","doi":"10.1504/ijcaet.2020.10029059","DOIUrl":null,"url":null,"abstract":"Increased usage of internet led to the migration of large amount of data to the cloud environment which uses Hadoop and map reduce framework for managing various mining applications in distributed environment. Earlier research activity in distributed mining comprises of solving complex problems using distributed computational techniques and new algorithmic designs. But as the nature of the data and user requirement becomes more complex and demanding, the existing distributed algorithms fails in multiple aspects. In our work, a new distributed frequent pattern algorithm, named Hadoop-based parallel frequent pattern mining (HPFP) has been proposed to optimally utilise the clusters efficiently and mine repeated patterns from large databases very effectively. The empirical evaluation shows that HPFP algorithm improves the performance of mining operation by increasing the level of parallelism and execution efficacy. HPFP achieves complete parallelism and delivers superior performance to become an efficient algorithm in HDFS, than existing distributed pattern mining algorithms.","PeriodicalId":346646,"journal":{"name":"Int. J. Comput. Aided Eng. Technol.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HDFS-based parallel and scalable pattern mining using clouds for incremental data\",\"authors\":\"S. Sountharrajan, E. Suganya, N. Aravindhraj, S. Sankarananth, C. Rajan\",\"doi\":\"10.1504/ijcaet.2020.10029059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increased usage of internet led to the migration of large amount of data to the cloud environment which uses Hadoop and map reduce framework for managing various mining applications in distributed environment. Earlier research activity in distributed mining comprises of solving complex problems using distributed computational techniques and new algorithmic designs. But as the nature of the data and user requirement becomes more complex and demanding, the existing distributed algorithms fails in multiple aspects. In our work, a new distributed frequent pattern algorithm, named Hadoop-based parallel frequent pattern mining (HPFP) has been proposed to optimally utilise the clusters efficiently and mine repeated patterns from large databases very effectively. The empirical evaluation shows that HPFP algorithm improves the performance of mining operation by increasing the level of parallelism and execution efficacy. HPFP achieves complete parallelism and delivers superior performance to become an efficient algorithm in HDFS, than existing distributed pattern mining algorithms.\",\"PeriodicalId\":346646,\"journal\":{\"name\":\"Int. J. Comput. Aided Eng. Technol.\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Aided Eng. Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/ijcaet.2020.10029059\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Aided Eng. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijcaet.2020.10029059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

互联网使用的增加导致大量数据迁移到云环境,使用Hadoop和map reduce框架来管理分布式环境中的各种挖掘应用程序。分布式采矿的早期研究活动包括使用分布式计算技术和新的算法设计来解决复杂问题。但是随着数据性质和用户需求的日益复杂和苛刻,现有的分布式算法在很多方面都出现了缺陷。在我们的工作中,提出了一种新的分布式频繁模式算法,即基于hadoop的并行频繁模式挖掘(HPFP),以最优地利用集群,有效地从大型数据库中挖掘重复模式。实证评价表明,HPFP算法通过提高并行度和执行效率,提高了挖掘操作的性能。与现有的分布式模式挖掘算法相比,HPFP实现了完全并行性,并提供了优越的性能,成为HDFS中高效的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
HDFS-based parallel and scalable pattern mining using clouds for incremental data
Increased usage of internet led to the migration of large amount of data to the cloud environment which uses Hadoop and map reduce framework for managing various mining applications in distributed environment. Earlier research activity in distributed mining comprises of solving complex problems using distributed computational techniques and new algorithmic designs. But as the nature of the data and user requirement becomes more complex and demanding, the existing distributed algorithms fails in multiple aspects. In our work, a new distributed frequent pattern algorithm, named Hadoop-based parallel frequent pattern mining (HPFP) has been proposed to optimally utilise the clusters efficiently and mine repeated patterns from large databases very effectively. The empirical evaluation shows that HPFP algorithm improves the performance of mining operation by increasing the level of parallelism and execution efficacy. HPFP achieves complete parallelism and delivers superior performance to become an efficient algorithm in HDFS, than existing distributed pattern mining algorithms.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信