{"title":"改进历史网页数据的范围查询性能","authors":"Geng Li, Bo Peng","doi":"10.1109/ChinaGrid.2010.28","DOIUrl":null,"url":null,"abstract":"This paper is about the performance of range queries on historic web page data set, i.e. requests into a data set of web pages that keeps record of historic versions of HTML data of URLs on the web for a subset of data, the URLs and the timestamps of which satisfy the query conditions. To keep track of all versions of every web URL, the data set could easily scale up to terabytes. Hence, systems providing query services to such a data set would require much computing resource. We show that in this scenario data storage layout has significant impact on query performance and propose storage design principles for performance improvement through quantitative approaches.","PeriodicalId":429657,"journal":{"name":"2010 Fifth Annual ChinaGrid Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Range Query Performance on Historic Web Page Data\",\"authors\":\"Geng Li, Bo Peng\",\"doi\":\"10.1109/ChinaGrid.2010.28\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper is about the performance of range queries on historic web page data set, i.e. requests into a data set of web pages that keeps record of historic versions of HTML data of URLs on the web for a subset of data, the URLs and the timestamps of which satisfy the query conditions. To keep track of all versions of every web URL, the data set could easily scale up to terabytes. Hence, systems providing query services to such a data set would require much computing resource. We show that in this scenario data storage layout has significant impact on query performance and propose storage design principles for performance improvement through quantitative approaches.\",\"PeriodicalId\":429657,\"journal\":{\"name\":\"2010 Fifth Annual ChinaGrid Conference\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Fifth Annual ChinaGrid Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ChinaGrid.2010.28\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Fifth Annual ChinaGrid Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ChinaGrid.2010.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Range Query Performance on Historic Web Page Data
This paper is about the performance of range queries on historic web page data set, i.e. requests into a data set of web pages that keeps record of historic versions of HTML data of URLs on the web for a subset of data, the URLs and the timestamps of which satisfy the query conditions. To keep track of all versions of every web URL, the data set could easily scale up to terabytes. Hence, systems providing query services to such a data set would require much computing resource. We show that in this scenario data storage layout has significant impact on query performance and propose storage design principles for performance improvement through quantitative approaches.