{"title":"ExQueX: exploring and querying XML documents","authors":"B. Kimelfeld, Y. Sagiv, Gidi Weber","doi":"10.1145/1559845.1559993","DOIUrl":"https://doi.org/10.1145/1559845.1559993","url":null,"abstract":"ExQueX is an interactive system for exploring and querying XML documents. The exploration is done by searching, ranking and filtering, and it enables users to discover relationships that exist in a given document. The results of the exploration can be used either directly as tree queries or as building blocks in the process of formulating more complex queries. The latter is done by formulating an abstract tree query, whose edges represent relationships that are resolved by using concrete results from the exploration phase. ExQueX facilitates fast and clear understanding of both the contents and complex structures of XML documents.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124400027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Industrial session 5: transactions, security, and cashing","authors":"Bettina Kemme","doi":"10.1145/3257474","DOIUrl":"https://doi.org/10.1145/3257474","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115748354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust web extraction: an approach based on a probabilistic tree-edit model","authors":"Nilesh N. Dalvi, P. Bohannon, Fei Sha","doi":"10.1145/1559845.1559882","DOIUrl":"https://doi.org/10.1145/1559845.1559882","url":null,"abstract":"On script-generated web sites, many documents share common HTML tree structure, allowing wrappers to effectively extract information of interest. Of course, the scripts and thus the tree structure evolve over time, causing wrappers to break repeatedly, and resulting in a high cost of maintaining wrappers. In this paper, we explore a novel approach: we use temporal snapshots of web pages to develop a tree-edit model of HTML, and use this model to improve wrapper construction. We view the changes to the tree structure as suppositions of a series of edit operations: deleting nodes, inserting nodes and substituting labels of nodes. The tree structures evolve by choosing these edit operations stochastically. Our model is attractive in that the probability that a source tree has evolved into a target tree can be estimated efficiently--in quadratic time in the size of the trees--making it a potentially useful tool for a variety of tree-evolution problems. We give an algorithm to learn the probabilistic model from training examples consisting of pairs of trees, and apply this algorithm to collections of web-page snapshots to derive HTML-specific tree edit models. Finally, we describe a novel wrapper-construction framework that takes the tree-edit model into account, and compare the quality of resulting wrappers to that of traditional wrappers on synthetic and real HTML document examples.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"165 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121032263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiewen Huang, Lyublena Antova, Christoph E. Koch, Dan Olteanu
{"title":"MayBMS: a probabilistic database management system","authors":"Jiewen Huang, Lyublena Antova, Christoph E. Koch, Dan Olteanu","doi":"10.1145/1559845.1559984","DOIUrl":"https://doi.org/10.1145/1559845.1559984","url":null,"abstract":"MayBMS is a state-of-the-art probabilistic database management system which leverages the strengths of previous database research for achieving scalability. As a proof of concept for its ease of use, we have built on top of MayBMS a Web-based application that offers NBA-related information based on what-if analysis of team dynamics using data available at www.nba.com.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122340357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring schema repositories with schemr","authors":"Kuang Chen, J. Madhavan, A. Halevy","doi":"10.1145/1559845.1559991","DOIUrl":"https://doi.org/10.1145/1559845.1559991","url":null,"abstract":"Schemr is a schema search engine, and provides users the ability to search for and visualize schemas stored in a metadata repository. Users may search by keywords and by example -- using schema fragments as query terms. Schemr uses a novel search algorithm, based on a combination of text search and schema matching techniques, as well as a structurally-aware scoring metric. Schemr presents search results in a GUI that allows users to explore which elements match and how well they do. The GUI supports interactions, including panning, zooming, layout and drilling-in. We demonstrate schema search and visualization, introduce Schemr as a new component of the information integration toolbox, and discuss its benefits in several applications.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123019349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FlashLogging: exploiting flash devices for synchronous logging performance","authors":"Shimin Chen","doi":"10.1145/1559845.1559855","DOIUrl":"https://doi.org/10.1145/1559845.1559855","url":null,"abstract":"Synchronous transactional logging is the central mechanism for ensuring data persistency and recoverability in database systems. Unfortunately, magnetic disks are ill-suited for the small sequential write pattern of synchronous logging. Alternative solutions (e.g., backup servers or sophisticated battery-backed write caches in high-end disk arrays) are either expensive or complicated. In this paper, we exploit flash devices for synchronous logging based on the observation that flash devices support small sequential writes well. Comparing a wide variety of flash devices, we find that USB flash drives are a good match for this task because of its unique characteristics: widely available USB ports, hot-plug capability useful for coping with flash wear, and low price so that multiple drives are affordable. We propose FlashLogging, a logging solution that exploits multiple (USB) flash drives for synchronous logging. We identify and address four challenges: (i) efficiently exploiting multiple flash drives for logging; (ii) coping with the large variance of write latencies because of device erasure operations; (iii) efficient recovery processing; and (iv) combining flash drives and disks for better logging and recovery performance. We implemented our solution within MySQL-InnoDB. Our real machine experiments running online transaction processing workloads (TPCC) show that FlashLogging achieves up to 5.7X improvements over magnetic-disk-based logging, and obtains up to 98.6% of the ideal performance. We further compare our design with one that uses Solid-State Drives (SSDs), and find that although SSDs improve logging performance, multiple USB flash drives can achieve comparable or better performance with much lower price.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123036923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Research session 14: understanding data and queries","authors":"D. Srivastava","doi":"10.1145/3257462","DOIUrl":"https://doi.org/10.1145/3257462","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117076698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arvind Kumar, A. Purandare, Jay Chen, Arthur Meacham, L. Subramanian
{"title":"ELMR: lightweight mobile health records","authors":"Arvind Kumar, A. Purandare, Jay Chen, Arthur Meacham, L. Subramanian","doi":"10.1145/1559845.1559974","DOIUrl":"https://doi.org/10.1145/1559845.1559974","url":null,"abstract":"Cell phones are increasingly being used as common clients for a wide suite of distributed, database-centric healthcare applications in developing regions. This is particularly true for rural developing regions where the bulk of the healthcare is handled by health workers due to lack of doctors; the widespread availability of cellular services have made mobile devices as an important computing platform for enabling healthcare applications for these health workers. Unfortunately, the current SQL model for distributed client/server systems is far too heavy-weight for these applications, particularly in light of the high communications cost and extremely limited data transmission capacity available in these environments. In this demonstration, we describe the Efficient Lightweight Mobile Records (ELMR) system that provides a practical and lightweight database access protocol for accessing and updating records remotely from mobile devices under an extremely bandwidth and cost-constrained Short Messaging Service (SMS) channel comprising of 140 byte packets. We have implemented ELMR using the RMS functionality in J2ME, and integrated it into an HIV treatment application we are developing for use by African health workers.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"696 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Research session 21: indexing","authors":"Xuemin Lin","doi":"10.1145/3257469","DOIUrl":"https://doi.org/10.1145/3257469","url":null,"abstract":"","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124709929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parag Agrawal, Adam Silberstein, Brian F. Cooper, U. Srivastava, R. Ramakrishnan
{"title":"Asynchronous view maintenance for VLSD databases","authors":"Parag Agrawal, Adam Silberstein, Brian F. Cooper, U. Srivastava, R. Ramakrishnan","doi":"10.1145/1559845.1559866","DOIUrl":"https://doi.org/10.1145/1559845.1559866","url":null,"abstract":"The query models of the recent generation of very large scale distributed (VLSD) shared-nothing data storage systems, including our own PNUTS and others (e.g. BigTable, Dynamo, Cassandra, etc.) are intentionally simple, focusing on simple lookups and scans and trading query expressiveness for massive scale. Indexes and views can expand the query expressiveness of such systems by materializing more complex access paths and query results. In this paper, we examine mechanisms to implement indexes and views in a massive scale distributed database. For web applications, minimizing update latencies is critical, so we advocate deferring the work of maintaining views and indexes as much as possible. We examine the design space, and conclude that two types of view implementations, called remote view tables (RVTs) and local view tables (LVTs), provide good tradeoff between system throughput and minimizing view staleness. We describe how to construct and maintain such view tables, and how they can be used to implement indexes, group-by-aggregate views, equijoin views and selection views. We also introduce and analyze a consistency model that makes it easier for application developers to cope with the impact of deferred view maintenance. An empirical evaluation quantifies the maintenance costs of our views, and shows that they can significantly improve the cost of evaluating complex queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123796415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}