Proceedings of the 2009 ACM SIGMOD International Conference on Management of data最新文献

ExQueX: exploring and querying XML documents ExQueX:探索和查询XML文档

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559993

B. Kimelfeld, Y. Sagiv, Gidi Weber

引用次数: 5

Session details: Industrial session 5: transactions, security, and cashing 会话详细信息:工业会话5:事务、安全性和兑现

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257474

Bettina Kemme

引用次数: 0

Robust web extraction: an approach based on a probabilistic tree-edit model 基于概率树编辑模型的鲁棒web提取方法

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559882

Nilesh N. Dalvi, P. Bohannon, Fei Sha

{"title":"Robust web extraction: an approach based on a probabilistic tree-edit model","authors":"Nilesh N. Dalvi, P. Bohannon, Fei Sha","doi":"10.1145/1559845.1559882","DOIUrl":"https://doi.org/10.1145/1559845.1559882","url":null,"abstract":"On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus the tree structure evolve over time, causing wrappers to break repeatedly, and resulting in a high cost of maintaining wrappers. In this paper, we explore a novel approach: we use temporal snapshots of web pages to develop a tree-edit model of HTML, and use this model to improve wrapper construction. We view the changes to the tree structure as suppositions of a series of edit operations: deleting nodes, inserting nodes and substituting labels of nodes. The tree structures evolve by choosing these edit operations stochastically. Our model is attractive in that the probability that a source tree has evolved into a target tree can be estimated efficiently--in quadratic time in the size of the trees--making it a potentially useful tool for a variety of tree-evolution problems. We give an algorithm to learn the probabilistic model from training examples consisting of pairs of trees, and apply this algorithm to collections of web-page snapshots to derive HTML-specific tree edit models. Finally, we describe a novel wrapper-construction framework that takes the tree-edit model into account, and compare the quality of resulting wrappers to that of traditional wrappers on synthetic and real HTML document examples.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121032263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 80

MayBMS: a probabilistic database management system MayBMS:一个概率数据库管理系统

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559984

Jiewen Huang, Lyublena Antova, Christoph E. Koch, Dan Olteanu

引用次数: 166

Exploring schema repositories with schemr 使用schemr探索模式存储库

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559991

Kuang Chen, J. Madhavan, A. Halevy

引用次数: 5

FlashLogging: exploiting flash devices for synchronous logging performance FlashLogging:利用flash设备实现同步日志记录性能

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559855

Shimin Chen

{"title":"FlashLogging: exploiting flash devices for synchronous logging performance","authors":"Shimin Chen","doi":"10.1145/1559845.1559855","DOIUrl":"https://doi.org/10.1145/1559845.1559855","url":null,"abstract":"Synchronous transactional logging is the central mechanism for ensuring data persistency and recoverability in database systems. Unfortunately, magnetic disks are ill-suited for the small sequential write pattern of synchronous logging. Alternative solutions (e.g., backup servers or sophisticated battery-backed write caches in high-end disk arrays) are either expensive or complicated. In this paper, we exploit flash devices for synchronous logging based on the observation that flash devices support small sequential writes well. Comparing a wide variety of flash devices, we find that USB flash drives are a good match for this task because of its unique characteristics: widely available USB ports, hot-plug capability useful for coping with flash wear, and low price so that multiple drives are affordable. We propose FlashLogging, a logging solution that exploits multiple (USB) flash drives for synchronous logging. We identify and address four challenges: (i) efficiently exploiting multiple flash drives for logging; (ii) coping with the large variance of write latencies because of device erasure operations; (iii) efficient recovery processing; and (iv) combining flash drives and disks for better logging and recovery performance. We implemented our solution within MySQL-InnoDB. Our real machine experiments running online transaction processing workloads (TPCC) show that FlashLogging achieves up to 5.7X improvements over magnetic-disk-based logging, and obtains up to 98.6% of the ideal performance. We further compare our design with one that uses Solid-State Drives (SSDs), and find that although SSDs improve logging performance, multiple USB flash drives can achieve comparable or better performance with much lower price.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123036923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 106

Session details: Research session 14: understanding data and queries 研究部分14:理解数据和查询

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257462

D. Srivastava

引用次数: 0

ELMR: lightweight mobile health records 轻量级移动健康记录

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559974

Arvind Kumar, A. Purandare, Jay Chen, Arthur Meacham, L. Subramanian

{"title":"ELMR: lightweight mobile health records","authors":"Arvind Kumar, A. Purandare, Jay Chen, Arthur Meacham, L. Subramanian","doi":"10.1145/1559845.1559974","DOIUrl":"https://doi.org/10.1145/1559845.1559974","url":null,"abstract":"Cell phones are increasingly being used as common clients for a wide suite of distributed, database-centric healthcare applications in developing regions. This is particularly true for rural developing regions where the bulk of the healthcare is handled by health workers due to lack of doctors; the widespread availability of cellular services have made mobile devices as an important computing platform for enabling healthcare applications for these health workers. Unfortunately, the current SQL model for distributed client/server systems is far too heavy-weight for these applications, particularly in light of the high communications cost and extremely limited data transmission capacity available in these environments. In this demonstration, we describe the Efficient Lightweight Mobile Records (ELMR) system that provides a practical and lightweight database access protocol for accessing and updating records remotely from mobile devices under an extremely bandwidth and cost-constrained Short Messaging Service (SMS) channel comprising of 140 byte packets. We have implemented ELMR using the RMS functionality in J2ME, and integrated it into an HIV treatment application we are developing for use by African health workers.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"696 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Session details: Research session 21: indexing 会议详情:研究会议21:索引

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/3257469

Xuemin Lin

引用次数: 0

Asynchronous view maintenance for VLSD databases VLSD数据库的异步视图维护

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI: 10.1145/1559845.1559866

Parag Agrawal, Adam Silberstein, Brian F. Cooper, U. Srivastava, R. Ramakrishnan

{"title":"Asynchronous view maintenance for VLSD databases","authors":"Parag Agrawal, Adam Silberstein, Brian F. Cooper, U. Srivastava, R. Ramakrishnan","doi":"10.1145/1559845.1559866","DOIUrl":"https://doi.org/10.1145/1559845.1559866","url":null,"abstract":"The query models of the recent generation of very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massive scale. Indexes and views can expand the query expressiveness of such systems by materializing more complex access paths and query results. In this paper, we examine mechanisms to implement indexes and views in a massive scale distributed database. For web applications, minimizing update latencies is critical, so we advocate deferring the work of maintaining views and indexes as much as possible. We examine the design space, and conclude that two types of view implementations, called remote view tables (RVTs) and local view tables (LVTs), provide good tradeoff between system throughput and minimizing view staleness. We describe how to construct and maintain such view tables, and how they can be used to implement indexes, group-by-aggregate views, equijoin views and selection views. We also introduce and analyze a consistency model that makes it easier for application developers to cope with the impact of deferred view maintenance. An empirical evaluation quantifies the maintenance costs of our views, and shows that they can significantly improve the cost of evaluating complex queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123796415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 84