{"title":"Efficient query processing in geographic web search engines","authors":"Yen-Yu Chen, Torsten Suel, Alexander Markowetz","doi":"10.1145/1142473.1142505","DOIUrl":"https://doi.org/10.1145/1142473.1142505","url":null,"abstract":"Geographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant interest from major search engine companies. Academic research in this area has focused primarily on techniques for extracting geographic knowledge from the web. In this paper, we study the problem of efficient query processing in scalable geographic search engines. Query processing is a major bottleneck in standard web search engines, and the main reason for the thousands of machines used by the major engines. Geographic search engine query processing is different in that it requires a combination of text and spatial data processing techniques. We propose several algorithms for efficient query processing in geographic search engines, integrate them into an existing web search query processor, and evaluate them on large sets of real data and query traces.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114480898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PHP: supporting the new paradigm of situational and composite web applications","authors":"Andi Gutmans","doi":"10.1145/1142473.1142553","DOIUrl":"https://doi.org/10.1145/1142473.1142553","url":null,"abstract":"In this paper, I describe what we see as a paradigm shift in software development and how PHP plays into this change.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115792185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Olga Papaemmanouil, Yanif Ahmad, U. Çetintemel, John Jannotti, Y. Yildirim
{"title":"XPORT: extensible profile-driven overlay routing trees","authors":"Olga Papaemmanouil, Yanif Ahmad, U. Çetintemel, John Jannotti, Y. Yildirim","doi":"10.1145/1142473.1142583","DOIUrl":"https://doi.org/10.1145/1142473.1142583","url":null,"abstract":"XPORT is a profile-driven distributed data collection and dissemination system that supports an extensible set of data types, profiles, and optimization metrics. XPORT efficiently builds a generic tree-based overlay network, which can be customized per application using a small number of methods that encapsulate application-specific data-profile matching, aggregation, and optimization logic. The clean separation between the \"plumbing\" and \"application\" enables XPORT to uniformly and easily support disparate dissemination-based applications such as content-based feed dissemination and application-level multicast. We propose to demonstrate the basic XPORT system, featuring its extensible optimization framework that facilitates easy specification of a wide range of useful performance goals and a continuous, adaptive optimization model to achieve these goals under changing network and application conditions. We will use two different underlying applications, an RSS feed dissemination application and a multiplayer network game, along with visual system-monitoring tools to illustrate the extensibility and the operational aspects of XPORT.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121419902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forensic analysis of database tampering","authors":"Kyriacos E. Pavlou, R. Snodgrass","doi":"10.1145/1142473.1142487","DOIUrl":"https://doi.org/10.1145/1142473.1142487","url":null,"abstract":"Mechanisms now exist that detect tampering of a database, through the use of cryptographically-strong hash functions. This paper addresses the next problem, that of determining who, when, and what, by providing a systematic means of performing forensic analysis after such tampering has been uncovered. We introduce a schematic representation termed a \"corruption diagram\" that aids in intrusion investigation. We use these diagrams to fully analyze the original proposal, that of a linked sequence of hash values. We examine the various kinds of intrusions that are possible, including retroactive, introactive, backdating, and postdating intrusions. We then introduce successively more sophisticated forensic analysis algorithms: the monochromatic, RGB, and polychromatic algorithms, and characterize the \"forensic strength\" of these algorithms. We show how forensic analysis can efficiently extract a good deal of information concerning a corruption event.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121454528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recovery from \"bad\" user transactions","authors":"D. Lomet, Zografoula Vagena, R. Barga","doi":"10.1145/1142473.1142512","DOIUrl":"https://doi.org/10.1145/1142473.1142512","url":null,"abstract":"User written transaction code is responsible for the \"C\" in ACID transactions, i.e., taking the database from one consistent state to the next. However, user transactions can be flawed and lead to inconsistent (or invalid) states. Database systems usually correct invalid data using \"point in time\" recovery, a costly process that installs a backup and rolls it forward. The result is long outages and the \"de-commit\" of many valid transactions, which must then be re-submitted, frequently manually. We have implemented in our transaction-time database system a technique in which only data tainted by a flawed transaction and transactions dependent upon its updates are \"removed\". This process identifies and quarantines tainted data despite the complication of determining transactions dependent on data written by the flawed transaction. A further property of our implementation is that no backup needs to be installed for this because the prior transaction-time states provide an online backup.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"154 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122780711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A system for specification and verification of interactive, data-driven web applications","authors":"L. Sui","doi":"10.1145/1142473.1142584","DOIUrl":"https://doi.org/10.1145/1142473.1142584","url":null,"abstract":"When comparing alternative query execution plans (QEPs), a cost-based query optimizer in a relational database management system needs to estimate the selectivity of conjunctive predicates. To avoid inaccurate independence assumptions, modern optimizers try to exploit multivariate statistics (MVS) that provide knowledge about joint frequencies in a table of a relation. Because the complete joint distribution is almost always too large to store, optimizers are given only partial knowledge about this distribution. As a result, there exist multiple, non-equivalent ways to estimate the selectivity of a conjunctive predicate. To consistently combine the partial knowledge during the estimation process, existing optimizers employ cumbersome ad hoc heuristics. These methods unjustifiably ignore valuable information, and the optimizer tends to favor QEPs for which the least information is available. This bias problem yields poor QEP quality and performance. We demonstrate MAXENT, a novel approach based on the maximum entropy principle, prototyped in IBM DB2 LUW. We illustrate MAXENT's ability to consistently estimate the selectivity of conjunctive predicates on a per-table basis. In contrast to the DB2 optimizer's current ad hoc methods, we show how MAXENT exploits all available information about the joint column distribution and thus avoids the bias problem. For some complex queries against a real-world database, we show that MAXENT improves selectivity estimates by orders of magnitude relative to the current DB2 optimizer, and also show how these improved estimate influence plan choices as well as query execution times.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122976372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic physical design tuning: workload as a sequence","authors":"S. Agrawal, E. Chu, Vivek R. Narasayya","doi":"10.1145/1142473.1142549","DOIUrl":"https://doi.org/10.1145/1142473.1142549","url":null,"abstract":"The area of automatic selection of physical database design to optimize the performance of a relational database system based on a workload of SQL queries and updates has gained prominence in recent years. Major database vendors have released automated physical database design tools with the goal of reducing the total cost of ownership. An important assumption underlying these tools is that the workload is a set of SQL statements. In this paper, we show that being able to treat the workload as a sequence, i.e., exploiting the ordering of statements can significantly broaden the usage of such tools. We present scenarios where exploiting sequence information in the workload is crucial for performance tuning. We also propose techniques for addressing the technical challenges arising from treating the workload as a sequence. We evaluate the effectiveness of our techniques through experiments on Microsoft SQL Server.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128820116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-sensitive ranking","authors":"R. Agrawal, R. Rantzau, Evimaria Terzi","doi":"10.1145/1142473.1142517","DOIUrl":"https://doi.org/10.1145/1142473.1142517","url":null,"abstract":"Contextual preferences take the form that item i1 is preferred to item i2 in the context of X. For example, a preference might state the choice for Nicole Kidman over Penelope Cruz in drama movies, whereas another preference might choose Penelope Cruz over Nicole Kidman in the context of Spanish dramas. Various sources provide preferences independently and thus preferences may contain cycles and contradictions. We reconcile democratically the preferences accumulated from various sources and use them to create a priori orderings of tuples in an off-line preprocessing step. Only a few representative orders are saved, each corre-sponding to a set of contexts. These orders and associated contexts are used at query time to expeditiously provide ranked answers. We formally define contextual preferences, provide algorithms for creating orders and processing queries, and present experimental results that show their efficacy and practical utility.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129407536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Bernstein, A. Fekete, Hongfei Guo, R. Ramakrishnan, Pradeep Tamma
{"title":"Relaxed-currency serializability for middle-tier caching and replication","authors":"P. Bernstein, A. Fekete, Hongfei Guo, R. Ramakrishnan, Pradeep Tamma","doi":"10.1145/1142473.1142540","DOIUrl":"https://doi.org/10.1145/1142473.1142540","url":null,"abstract":"Many applications, such as e-commerce, routinely use copies of data that are not in sync with the database due to heuristic caching strategies used to enhance performance. We study concurrency control for a transactional model that allows update transactions to read out-of-date copies. Each read operation carries a \"freshness constraint\" that specifies how fresh a copy must be in order to be read. We offer a definition of correctness for this model and present algorithms to ensure several of the most interesting freshness constraints. We outline a serializability-theoretic correctness proof and present the results of a detailed performance study.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129781868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}