{"title":"用于大规模Web服务的页面摘要","authors":"D. Rocco, David J. Buttler, Ling Liu","doi":"10.1109/COEC.2003.1210274","DOIUrl":null,"url":null,"abstract":"We introduce Page Digest, a mechanism for efficient storage and processing of Web documents. The Page Digest design encourages a clean separation of the structural elements of Web documents from their content. Its encoding transformation produces many of the advantages of traditional string digest schemes yet remains invertible without introducing significant additional cost or complexity. Using the Page Digest encoding can provide at least an order of magnitude speedup when traversing a Web document as compared to using a standard document object model implementation. Our experiments show that change detection using Page Digest operates in linear time, offering 75% improvement in execution performance compared with existing systems. In addition, the Page Digest encoding can reduce the tag name redundancy found in Web documents, allowing 30% to 50% reduction in document size.","PeriodicalId":375124,"journal":{"name":"EEE International Conference on E-Commerce, 2003. CEC 2003.","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Page Digest for large-scale Web services\",\"authors\":\"D. Rocco, David J. Buttler, Ling Liu\",\"doi\":\"10.1109/COEC.2003.1210274\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce Page Digest, a mechanism for efficient storage and processing of Web documents. The Page Digest design encourages a clean separation of the structural elements of Web documents from their content. Its encoding transformation produces many of the advantages of traditional string digest schemes yet remains invertible without introducing significant additional cost or complexity. Using the Page Digest encoding can provide at least an order of magnitude speedup when traversing a Web document as compared to using a standard document object model implementation. Our experiments show that change detection using Page Digest operates in linear time, offering 75% improvement in execution performance compared with existing systems. In addition, the Page Digest encoding can reduce the tag name redundancy found in Web documents, allowing 30% to 50% reduction in document size.\",\"PeriodicalId\":375124,\"journal\":{\"name\":\"EEE International Conference on E-Commerce, 2003. CEC 2003.\",\"volume\":\"103 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"EEE International Conference on E-Commerce, 2003. CEC 2003.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COEC.2003.1210274\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"EEE International Conference on E-Commerce, 2003. CEC 2003.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COEC.2003.1210274","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We introduce Page Digest, a mechanism for efficient storage and processing of Web documents. The Page Digest design encourages a clean separation of the structural elements of Web documents from their content. Its encoding transformation produces many of the advantages of traditional string digest schemes yet remains invertible without introducing significant additional cost or complexity. Using the Page Digest encoding can provide at least an order of magnitude speedup when traversing a Web document as compared to using a standard document object model implementation. Our experiments show that change detection using Page Digest operates in linear time, offering 75% improvement in execution performance compared with existing systems. In addition, the Page Digest encoding can reduce the tag name redundancy found in Web documents, allowing 30% to 50% reduction in document size.