基于相似图的解簇方法及其在网格文件并行化中的应用

Proceedings of the Eleventh International Conference on Data Engineering Pub Date : 1995-03-06 DOI:10.1109/ICDE.1995.380370

Duen-Ren Liu, S. Shekhar

{"title":"基于相似图的解簇方法及其在网格文件并行化中的应用","authors":"Duen-Ren Liu, S. Shekhar","doi":"10.1109/ICDE.1995.380370","DOIUrl":null,"url":null,"abstract":"We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing grid files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.<<ETX>>","PeriodicalId":184415,"journal":{"name":"Proceedings of the Eleventh International Conference on Data Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"39","resultStr":"{\"title\":\"A similarity graph-based approach to declustering problems and its application towards parallelizing grid files\",\"authors\":\"Duen-Ren Liu, S. Shekhar\",\"doi\":\"10.1109/ICDE.1995.380370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing grid files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.<<ETX>>\",\"PeriodicalId\":184415,\"journal\":{\"name\":\"Proceedings of the Eleventh International Conference on Data Engineering\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"39\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Eleventh International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.1995.380370\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eleventh International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1995.380370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 39

摘要

我们提出了一种新的基于相似性的数据聚类技术。该方法可以适应查询分布、数据分布、数据大小和分区大小约束的可用信息。该方法基于在给定数据集上定义的相似图的最大切割分区，并在分区大小的约束下进行分区。它最大限度地提高了将查询一起访问的一对数据项分配到不同磁盘的可能性。结果表明，如果存在其他能达到最佳加速的聚类方法，该方法可以达到最佳加速。并行网格文件的实验表明，该方法在有趣查询分布和非均匀数据分布方面都优于基于映射函数的方法

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A similarity graph-based approach to declustering problems and its application towards parallelizing grid files

We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing grid files show that the proposed method outperforms mapping-function-based methods for interesting query distributions as well for non-uniform data distributions.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Eleventh International Conference on Data Engineering

自引率

0.00%

发文量