{"title":"Windows and RSS: beyond blogging","authors":"Sean Lyndersay","doi":"10.1145/1142473.1142563","DOIUrl":"https://doi.org/10.1145/1142473.1142563","url":null,"abstract":"RSS (and related technologies like Atom) are gaining significant traction as a means for allowing users to \"subscribe\" to content on the web and get notified when new content is available. More recently, \"podcasting\" -- a simple extension to RSS to enable references to audio files -- has taken off as a means to subscribe to episodal audio content. More generally, RSS feeds are being used in many arenas to communicate all sorts of different types of content, either using extensions to the RSS format, or simply by transmitting binary files.At its heart, RSS is a very simple XML-based format with very simple semantics, but the potential uses appear endless. This talk will examine many of the uses of RSS, and discuss why this simple format has become so important that Microsoft is building native support for RSS into its next generation operating system and browser platforms.It will also cover many of the technical challenges inherent in building scalable support for RSS into a client operating system.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116657215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using the oracle database as a declarative RSS hub","authors":"D. Gawlick, M. Krishnaprasad, Z. Liu","doi":"10.1145/1142473.1142562","DOIUrl":"https://doi.org/10.1145/1142473.1142562","url":null,"abstract":"The interaction with the Web has historically evolved from static bookmarks to dynamic searches to the current usage of active notification mechanisms based on popular protocols like RSS or Atom. In the same time a large volume of important source data is still contained in relational databases. The talk will analyze the way the Oracle database participates to the activation of the data and opening the state changes in a standard and secure way for easy integrating with the rest of the push based Web protocols. We will study the declarative specification of RSS feeds generated based on the state changes detected in the data stored in the Oracle database. On the opposite, external RSS feeds can be injected to the database and processed declaratively in conjunction with the rest of the data. Most of the technical pieces required for such a solution are already supported by the database engine (e.g. declarative XML processing, state change notifications, queues, crawlers, continuous queries), effectively turning the database into a declarative XML hub. The advantages of using database solutions for such problems in an enterprise context are security, scalability and reliability.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127720760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Govindaraju, J. Gray, Ritesh Kumar, Dinesh Manocha
{"title":"GPUTeraSort: high performance graphics co-processor sorting for large database management","authors":"N. Govindaraju, J. Gray, Ritesh Kumar, Dinesh Manocha","doi":"10.1145/1142473.1142511","DOIUrl":"https://doi.org/10.1145/1142473.1142511","url":null,"abstract":"We present a novel external sorting algorithm using graphics processors (GPUs) on large databases composed of billions of records and wide keys. Our algorithm uses the data parallelism within a GPU along with task parallelism by scheduling some of the memory-intensive and compute-intensive threads on the GPU. Our new sorting architecture provides multiple memory interfaces on the same PC -- a fast and dedicated memory interface on the GPU along with the main memory interface for CPU computations. As a result, we achieve higher memory bandwidth as compared to CPU-based algorithms running on commodity PCs. Our approach takes into account the limited communication bandwidth between the CPU and the GPU, and reduces the data communication between the two processors. Our algorithm also improves the performance of disk transfers and achieves close to peak I/O performance. We have tested the performance of our algorithm on the SortBenchmark and applied it to large databases composed of a few hundred Gigabytes of data. Our results on a 3 GHz Pentium IV PC with $300 NVIDIA 7800 GT GPU indicate a significant performance improvement over optimized CPU-based algorithms on high-end PCs with 3.6 GHz Dual Xeon processors. Our implementation is able to outperform the current high-end PennySort benchmark and results in a higher performance to price ratio. Overall, our results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121671555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constraint chaining: on energy-efficient continuous monitoring in sensor networks","authors":"Adam Silberstein, R. Braynard, Jun Yang","doi":"10.1145/1142473.1142492","DOIUrl":"https://doi.org/10.1145/1142473.1142492","url":null,"abstract":"Wireless sensor networks have created new opportunities for data collection in a variety of scenarios, such as environmental and industrial, where we expect data to be temporally and spatially correlated. Researchers may want to continuously collect all sensor data from the network for later analysis. Suppression, both temporal and spatial, provides opportunities for reducing the energy cost of sensor data collection. We demonstrate how both types can be combined for maximal benefit. We frame the problem as one of monitoring node and edge constraints. A monitored node triggers a report if its value changes. A monitored edge triggers a report if the difference between its nodes' values changes. The set of reports collected at the base station is used to derive all node values. We fully exploit the potential of this global inference in our algorithm, CONCH, short for constraint chaining. Constraint chaining builds a network of constraints that are maintained locally, but allow a global view of values to be maintained with minimal cost. Network failure complicates the use of suppression, since either causes an absence of reports. We add enhancements to CONCH to build in redundant constraints and provide a method to interpret the resulting reports in case of uncertainty. Using simulation we experimentally evaluate CONCH's effectiveness against competing schemes in a number of interesting scenarios.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126452144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Signer, M. Norrie, Michael Grossniklaus, R. Belotti, C. Decurtins, N. Weibel
{"title":"Paper-based mobile access to databases","authors":"B. Signer, M. Norrie, Michael Grossniklaus, R. Belotti, C. Decurtins, N. Weibel","doi":"10.1145/1142473.1142581","DOIUrl":"https://doi.org/10.1145/1142473.1142581","url":null,"abstract":"Our demonstration is a paper-based interactive guide for visitors to the world's largest international arts festival that was developed as part of a project investigating new forms of context-aware information delivery and interaction in mobile environments. Information stored in a database is accessed from a set of interactive paper documents, including a printed festival brochure, a city map and a bookmark. Active areas are defined within the documents and selection of these using a special digital pen causes the corresponding query request along with context data to be sent to a festival application database and the response is returned to the visitor in the form of generated speech output. In addition to paper-based information browsing and transactions such as ticket booking, the digital pen can also be applied for data capture of event ratings and handwritten comments on events. The system integrates three main database components - a cross-media information platform, a content management framework for multi-channel context-aware publishing of data and the festival application database.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132925673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling skew in data streams","authors":"Flip Korn, S. Muthukrishnan, Yihua Wu","doi":"10.1145/1142473.1142495","DOIUrl":"https://doi.org/10.1145/1142473.1142495","url":null,"abstract":"Data stream applications have made use of statistical summaries to reason about the data using nonparametric tools such as histograms, heavy hitters, and join sizes. However, relatively little attention has been paid to modeling stream data parametrically, despite the potential this approach has for mining the data. The challenges to do model fitting at streaming speeds are both technical -- how to continually find fast and reliable parameter estimates on high speed streams of skewed data using small space -- and conceptual -- how to validate the goodness-of-fit and stability of the model online.In this paper, we show how to fit hierarchical (binomial multifractal) and non-hierarchical (Pareto) power-law models on a data stream. We address the technical challenges using an approach that maintains a sketch of the data stream and fits least-squares straight lines; it yields algorithms that are fast, space-efficient, and provide approximations of parameter value estimates with a priori quality guarantees relative to those obtained offline. We address the conceptual challenge by designing fast methods for online goodness-of-fit measurements on a data stream; we adapt the statistical testing technique of examining the quantile-quantile (q-q) plot, to perform online model validation at streaming speeds.As a concrete application of our techniques, we focus on network traffic data which has been shown to exhibit skewed distributions. We complement our analytic and algorithmic results with experiments on IP traffic streams in AT&T's Gigascope® data stream management system, to demonstrate practicality of our methods at line speeds. We measured the stability and robustness of these models over weeks of operational packet data in an IP network. In addition, we study an intrusion detection application, and demonstrate the potential of online parametric modeling.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133112792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stratos Papadomanolakis, A. Ailamaki, Julio C. López, Tiankai Tu, D. O'Hallaron, G. Heber
{"title":"Efficient query processing on unstructured tetrahedral meshes","authors":"Stratos Papadomanolakis, A. Ailamaki, Julio C. López, Tiankai Tu, D. O'Hallaron, G. Heber","doi":"10.1145/1142473.1142535","DOIUrl":"https://doi.org/10.1145/1142473.1142535","url":null,"abstract":"Modern scientific applications such as fluid dynamics and earthquake modeling heavily depend on massive volumes of data produced by computer simulations. Such applications require new data management capabilities in order to scale to terabyte-scale data volumes. The most common way to discretize the application domain is to decompose it into pyramids, forming an unstructured tetrahedral mesh. Modern simulations generate meshes of high resolution and precision, to be queried by a visualization or analysis tool. Tetrahedral meshes are extremely flexible and therefore vital to accurately model complex geometries, but also are difficult to index. To reduce query execution time, applications either use only subsets of the data or rely on different (less flexible) structures, thereby trading accuracy for speed.This paper presents efficient indexing techniques for common spatial (point and range) on tetrahedral meshes. Because the prevailing multidimensional indexing techniques attempt to approximate the tetrahedra using simpler shapes (primarily rectangles) the query performance deteriorates significantly as a function of the mesh's geometric complexity. We develop Directed Local Search (DLS), an efficient indexing algorithm based on mesh topology information that is practically insensitive to the geometric properties of meshes. We show how DLS can be easily and efficiently implemented within modern DBMS without requiring new exotic index structures and complex preprocessing. Finally, we present a new data layout approach for tetrahedral mesh datasets that provides better performance for scientific applications.compared to the traditional space filling curves. In our PostgreSQL implementation DLS reduces the number of disk page accesses by 26% to 4x, and improves the overall query execution time by 25% to 4.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129318754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Bhaskar, C. Botev, M. Chettiar, Lin Guo, J. Shanmugasundaram, F. Shao, Fan Yang
{"title":"Quark: an efficient XQuery full-text implementation","authors":"A. Bhaskar, C. Botev, M. Chettiar, Lin Guo, J. Shanmugasundaram, F. Shao, Fan Yang","doi":"10.1145/1142473.1142588","DOIUrl":"https://doi.org/10.1145/1142473.1142588","url":null,"abstract":"The XQuery 1.0 and XPath 2.0 Full-text (XQFT) language has been developed by the W3C to extend XQuery and XPath with full-text search capabilities. XQFT allows users to specify a mix of structured and complex full-text predicates, and also allows users to score/rank such queries. The power and flexibility of XQFT gives rise to two interesting questions. First, is it possible to efficiently integrate a full-function XML query language with sophisticated full-text search? Second, is it possible to score and rank arbitrary XQuery and XQFT queries? In this demonstration, we present evidence that it is indeed possible to achieve the above goals. We demonstrate the Quark open-source data management system and show how we can seamlessly and efficiently integrate structured and unstructured search over XML data. In particular, we demonstrate (a) techniques for efficiently evaluating keyword search over virtual XML views, and (b) a framework for scoring both structured and full-text predicates.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131022064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Chakrabarti, Venkatesh Ganti, Jiawei Han, Dong Xin
{"title":"Ranking objects based on relationships","authors":"K. Chakrabarti, Venkatesh Ganti, Jiawei Han, Dong Xin","doi":"10.1145/1142473.1142516","DOIUrl":"https://doi.org/10.1145/1142473.1142516","url":null,"abstract":"In many document collections, documents are related to objects such as document authors, products described in the document, or persons referred to in the document. In many applications, the goal is to find these objects that best match a set of keywords. However, the keywords may not necessarily occur in the target objects; they occur only in the documents. For example, in a product review database, a user might search for names of products (say, laptops) using keywords like \"lightweight\" and \"business use\" that occur only in the reviews but not in the names of laptops. In order to answer these queries, we need to exploit relationships between documents containing the keywords and the target objects related to those documents. Current keyword query paradigms do not exploit these relationships effectively and hence are inefficient for these queries.In this paper, we consider a class of queries called the \"object finder\" queries. Our main intuition is to exploit the relationships between searchable documents and related objects and further \"aggregate\" the document scores from these relationships in order to find the best ranking target objects. Building upon existing keyword search engines such as full text search, we design efficient algorithms that exploit the requirement of only the best k target objects to terminate early. The main challenge here is to push early termination through blocking operators such as group by and aggregation. Our experiments with real datasets and workloads demonstrate the effectiveness of our techniques. Although we present our techniques in the context of keyword search, our techniques apply to other types of ranked searches (e.g., multimedia search) as well.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121407957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. T. Loo, Tyson Condie, M. Garofalakis, David E. Gay, J. Hellerstein, Petros Maniatis, R. Ramakrishnan, Timothy Roscoe, I. Stoica
{"title":"Declarative networking: language, execution and optimization","authors":"B. T. Loo, Tyson Condie, M. Garofalakis, David E. Gay, J. Hellerstein, Petros Maniatis, R. Ramakrishnan, Timothy Roscoe, I. Stoica","doi":"10.1145/1142473.1142485","DOIUrl":"https://doi.org/10.1145/1142473.1142485","url":null,"abstract":"The networking and distributed systems communities have recently explored a variety of new network architectures, both for application-level overlay networks, and as prototypes for a next-generation Internet architecture. In this context, we have investigated declarative networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architectures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query processing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifications. Second, we introduce and prove correct relaxed versions of the traditional semi-naïve query evaluation technique, to overcome fundamental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the iheventual consistencyl. of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applications of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114263657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}