数据挖掘的最优网格利用算法

Valérie Fiolet, R. Olejnik, Guillem Lefait, B. Toursel
{"title":"数据挖掘的最优网格利用算法","authors":"Valérie Fiolet, R. Olejnik, Guillem Lefait, B. Toursel","doi":"10.1109/ISPDC.2006.36","DOIUrl":null,"url":null,"abstract":"Although many data mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve data mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (distributed data mining) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC by Brin et al. (1997). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the French national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous parameters on optimization of parallel efficiency","PeriodicalId":196790,"journal":{"name":"2006 Fifth International Symposium on Parallel and Distributed Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Optimal Grid Exploitation Algorithms for Data Mining\",\"authors\":\"Valérie Fiolet, R. Olejnik, Guillem Lefait, B. Toursel\",\"doi\":\"10.1109/ISPDC.2006.36\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although many data mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve data mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (distributed data mining) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC by Brin et al. (1997). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the French national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous parameters on optimization of parallel efficiency\",\"PeriodicalId\":196790,\"journal\":{\"name\":\"2006 Fifth International Symposium on Parallel and Distributed Computing\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-07-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 Fifth International Symposium on Parallel and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPDC.2006.36\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Fifth International Symposium on Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPDC.2006.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

尽管许多数据挖掘任务已经并行化,因此可以在专用集群上执行,但目前很少有解决方案可以解决网格或非专业工作站网络上的数据挖掘问题。目前的趋势是集中使用网格和/或桌面网格,以便利用任何可用的工作站,而不考虑它们的物理位置。如果特定于网格的算法具有与专用集群算法的一些共同特征,则许多约束是使用网格所固有的。特别是资源的波动性和通信成本降低了并行性的有效性。DisDaMin项目(分布式数据挖掘)重新审视了数据挖掘任务,并提出了新的网格可利用算法。DisDaMin机制首先使用聚类方法实现特定的数据碎片,然后根据网格上执行的具体情况实现异步协作技术。使用这种碎片化方法可以在每个节点上以最少的通信进行最佳的本地处理。利用这一点,我们引入了分布式算法DICCoop,这是Brin等人(1997)对DIC的一种改编。在法国国家电网GRID5000(欧洲CoreGrid的一部分)上进行了仿真,以证明所提出机制的效率。分析了众多参数对并行效率优化的影响
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimal Grid Exploitation Algorithms for Data Mining
Although many data mining tasks have been parallelized and can thus be executed on dedicated clusters, few solutions currently exist to solve data mining problems on a grid or a non-specialized network of workstations. The current tendency is to focus on the use of grids and/or desktop grids in order to exploit any available workstations with no considerations of their physical positions. If a grid specific algorithm has some common characteristics with a dedicated-cluster algorithm, many constraints are inherent to the use of the grid. In particular, resource volatility and communications cost reduce the parallelism effectiveness. The DisDaMin project (distributed data mining) revisits the data mining tasks and proposes new exploitable algorithms for grids. The DisDaMin mechanisms first implement a specific fragmentation of the data using clustering methods, and then realize asynchronous collaborative techniques according to the specifics of execution on grids. The use of this fragmentation method makes it possible to carry out optimal local processing on each node, with a minimum of communications. Using this, we introduce the distributed algorithm DICCoop, an adaptation of DIC by Brin et al. (1997). Simulations were performed to prove the efficiency of the proposed mechanisms and are hosted on the French national grid GRID5000 (part of the European CoreGrid). We analyse the impact of the numerous parameters on optimization of parallel efficiency
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信