Bruno Diniz, D. Nogueira, André Cardoso, R. Ferreira, Dorgival Olavo Guedes Neto, Wagner Meira Jr
{"title":"评估不规则复制大数据集的数据虚拟化","authors":"Bruno Diniz, D. Nogueira, André Cardoso, R. Ferreira, Dorgival Olavo Guedes Neto, Wagner Meira Jr","doi":"10.1109/CCGRID.2006.21","DOIUrl":null,"url":null,"abstract":"Large volumes of data are generated every day by experiments, simulations and all sorts of applications. It is common to observe situations where portions of data are irregularly replicated and distributed in different data sources. It would be desirable to be able to handle these several pieces of irregular data (replicated or not) as a unique large dataset. This is called data virtualization and is the focus of this paper. In this paper, we present a system which is capable of dealing with irregularly replicated data and is able to create a virtual view of the union of the individual irregular portions of data hosted by each data source. Our system indexes the data intervals from each data source and allows clients to submit queries against the virtual dataset created. In order to select what server will be responsible for each data interval of a query, we use and compare three algorithms, namely Random, Round-Robin and Weighted Round-Robin. The comparison is driven by simulation and the parameters for the simulation are all taken from a real data-centered application (the Virtual Microscope).","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Assessing Data Virtualization for Irregularly Replicated Large Datasets\",\"authors\":\"Bruno Diniz, D. Nogueira, André Cardoso, R. Ferreira, Dorgival Olavo Guedes Neto, Wagner Meira Jr\",\"doi\":\"10.1109/CCGRID.2006.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large volumes of data are generated every day by experiments, simulations and all sorts of applications. It is common to observe situations where portions of data are irregularly replicated and distributed in different data sources. It would be desirable to be able to handle these several pieces of irregular data (replicated or not) as a unique large dataset. This is called data virtualization and is the focus of this paper. In this paper, we present a system which is capable of dealing with irregularly replicated data and is able to create a virtual view of the union of the individual irregular portions of data hosted by each data source. Our system indexes the data intervals from each data source and allows clients to submit queries against the virtual dataset created. In order to select what server will be responsible for each data interval of a query, we use and compare three algorithms, namely Random, Round-Robin and Weighted Round-Robin. The comparison is driven by simulation and the parameters for the simulation are all taken from a real data-centered application (the Virtual Microscope).\",\"PeriodicalId\":419226,\"journal\":{\"name\":\"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2006.21\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2006.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Assessing Data Virtualization for Irregularly Replicated Large Datasets
Large volumes of data are generated every day by experiments, simulations and all sorts of applications. It is common to observe situations where portions of data are irregularly replicated and distributed in different data sources. It would be desirable to be able to handle these several pieces of irregular data (replicated or not) as a unique large dataset. This is called data virtualization and is the focus of this paper. In this paper, we present a system which is capable of dealing with irregularly replicated data and is able to create a virtual view of the union of the individual irregular portions of data hosted by each data source. Our system indexes the data intervals from each data source and allows clients to submit queries against the virtual dataset created. In order to select what server will be responsible for each data interval of a query, we use and compare three algorithms, namely Random, Round-Robin and Weighted Round-Robin. The comparison is driven by simulation and the parameters for the simulation are all taken from a real data-centered application (the Virtual Microscope).