Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)最新文献
{"title":"Unraveling the duplicate-elimination problem in XML-to-SQL query translation","authors":"R. Krishnamurthy, R. Kaushik, J. Naughton","doi":"10.1145/1017074.1017088","DOIUrl":"https://doi.org/10.1145/1017074.1017088","url":null,"abstract":"We consider the scenario where existing relational data is exported as XML. In this context, we look at the problem of translating XML queries into SQL. XML query languages have two different notions of duplicates: node-identity based and value-based. Path expression queries have an implicit node-identity based duplicate elimination built into them. On the other hand, SQL only supports value-based duplicate elimination. In this paper, using a simple path expression query we illustrate the problems that arise when we attempt to simulate the node-identity based duplicate elimination using value-based duplicate elimination in the SQL queries. We show how a general solution for this problem covering the class of views considered in published literature requires a fairly complex mechanism.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76922060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Beaver, Nicholas Morsillo, K. Pruhs, Panos K. Chrysanthis, V. Liberatore
{"title":"Scalable dissemination: what's hot and what's not","authors":"J. Beaver, Nicholas Morsillo, K. Pruhs, Panos K. Chrysanthis, V. Liberatore","doi":"10.1145/1017074.1017084","DOIUrl":"https://doi.org/10.1145/1017074.1017084","url":null,"abstract":"A major problem in web database applications and on the Internet in general is the scalable delivery of data. One proposed solution for this problem is a hybrid system that uses multicast push to scalably deliver the most popular data, and reserves traditional unicast pull for delivery of less popular data. However, such a hybrid scheme introduces a variety of data management problems at the server. In this paper we examine three of these problems: the push popularity problem, the document classification problem, and the bandwidth division problem. The push popularity problem is to estimate the popularity of the documents in the web site. The document classification problem is to determine which documents should be pushed and which documents must be pulled. The band-width division problem is to determine how much of the server bandwidth to devote to pushed documents and how much of the server bandwidth should be reserved for pulled documents. We propose simple and elegant solutions for these problems. We report on experiments with our system that validate our algorithms.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81781715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiyang Chen, Lisheng Sun, Osmar R Zaiane, R. Goebel
{"title":"Visualizing and discovering web navigational patterns","authors":"Jiyang Chen, Lisheng Sun, Osmar R Zaiane, R. Goebel","doi":"10.1145/1017074.1017079","DOIUrl":"https://doi.org/10.1145/1017074.1017079","url":null,"abstract":"Web site structures are complex to analyze. Cross-referencing the web structure with navigational behaviour adds to the complexity of the analysis. However, this convoluted analysis is necessary to discover useful patterns and understand the navigational behaviour of web site visitors, whether to improve web site structures, provide intelligent on-line tools or offer support to human decision makers. Moreover, interactive investigation of web access logs is often desired since it allows ad hoc discovery and examination of patterns not a priori known. Various visualization tools have been provided for this task but they often lack the functionality to conveniently generate new patterns. In this paper we propose a visualization tool to visualize web graphs, representations of web structure overlaid with information and pattern tiers. We also propose a web graph algebra to manipulate and combine web graphs and their layers in order to discover new patterns in an ad hoc manner.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86102695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Kamps, maarten marx, M. de Rijke, Börkur Sigurbjörnsson
{"title":"Best-match querying from document-centric XML","authors":"J. Kamps, maarten marx, M. de Rijke, Börkur Sigurbjörnsson","doi":"10.1145/1017074.1017089","DOIUrl":"https://doi.org/10.1145/1017074.1017089","url":null,"abstract":"On the Web, there is a pervasive use of XML to give lightweight semantics to textual collections. Such document-centric XML collections require a query language that can gracefully handle structural constraints as well as constraints on the free text of the documents. Our main contributions are three-fold. First, we outline two fragments of XPath tailored to users that have varying degrees of understanding of the XML structure used, and give both syntactic and semantic characterizations of these fragments. Second, we extend XPath with an about function having a best-match semantics based on the relevance of the document component for the expressed information need. Third, we evaluate the resulting query language using the INEX 2003 test suite, and show that best-match approaches outperform exact-match approaches for evaluating content-and-structure queries.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75119231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Querying bi-level information","authors":"S. Murthy, D. Maier, L. Delcambre","doi":"10.1145/1017074.1017078","DOIUrl":"https://doi.org/10.1145/1017074.1017078","url":null,"abstract":"In our research on superimposed information management, we have developed applications where information elements in the superimposed layer serve to annotate, comment, restructure, and combine selections from one or more existing documents in the base layer. Base documents tend to be unstructured or semi-structured (HTML pages, Excel spreadsheets, and so on) with marks delimiting selections. Selections in the base layer can be programmatically accessed via marks to retrieve content and context. The applications we have built to date allow creation of new marks and new superimposed elements (that use marks), but they have been browse-oriented and tend to expose the line between superimposed and base layers. Here, we present a new access capability, called bi-level queries, that allows an application or user to query over both layers as a whole. Bi-level queries provide an alternative style of data integration where only relevant portions of a base document are mediated (not the whole document) and the superimposed layer can add information not present in the base layer. We discuss our framework for superimposed information management, an initial implementation of a bi-level query system with an XML Query interface, and suggest mechanisms to improve scalability and performance.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73996505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DTDs versus XML schema: a practical study","authors":"G. Bex, F. Neven, J. V. D. Bussche","doi":"10.1145/1017074.1017095","DOIUrl":"https://doi.org/10.1145/1017074.1017095","url":null,"abstract":"Among the various proposals answering the shortcomings of Document Type Definitions (DTDs), XML Schema is the most widely used. Although DTDs and XML Schema Definitions (XSDs) differ syntactically, they are still quite related on an abstract level. Indeed, freed from all syntactic sugar, XML Schemas can be seen as an extension of DTDs with a restricted form of specialization. In the present paper, we inspect a number of DTDs and XSDs harvested from the web and try to answer the following questions: (1) which of the extra features/expressiveness of XML Schema not allowed by DTDs are effectively used in practice; and, (2) how sophisticated are the structural properties (i.e. the nature of regular expressions) of the two formalisms. It turns out that at present real-world XSDs only sparingly use the new features introduced by XML Schema: on a structural level the vast majority of them can already be defined by DTDs. Further, we introduce a class of simple regular expressions and obtain that a surprisingly high fraction of the content models belong to this class. The latter result sheds light on the justification of simplifying assumptions that sometimes have to be made in XML research.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81163356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic multicast for content-based stream dissemination","authors":"Olga Papaemmanouil, U. Çetintemel","doi":"10.1145/1017074.1017085","DOIUrl":"https://doi.org/10.1145/1017074.1017085","url":null,"abstract":"We consider the problem of content-based routing and dissemination of highly-distributed, fast data streams from multiple sources to multiple receivers. Our target application domain includes real-time, stream-based monitoring applications and large-scale event dissemination. We introduce SemCast, a new semantic multicast approach that, unlike previous approaches, eliminates the need for content-based forwarding at interior brokers and facilitates fine-grained control over the construction of dissemination overlays. We present the initial design of SemCast and provide an outline of the architectural and algorithmic challenges as well as our initial solutions. Preliminary experimental results show that SemCast can significantly reduce overall bandwidth requirements compared to traditional event-dissemination approaches.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76776563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Checking potential validity of XML documents","authors":"I. Iacob, Alex Dekhtyar, M. Dekhtyar","doi":"10.1145/1017074.1017097","DOIUrl":"https://doi.org/10.1145/1017074.1017097","url":null,"abstract":"The process of creation of document-centric XML documents often starts with a prepared textual content, into which the editor introduces markup. In such situations, intermediate XML is almost never valid with respect to the DTD/Schema used for the encoding. At the same time, it is important to ensure that at each moment of time, the editor is working with an XML document that can enriched with further markup to become valid. In this paper we introduce the notion of potential validity of XML documents, which allows us to distinguish between XML documents that are invalid because the encoding is simply incomplete and XML documents that are invalid because some of the DTD rules guiding the structure of the encoding were violated during the markup process. We give a linear-time algorithm for checking potential validity for documents.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80683901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Challenges in selecting paths for navigational queries: trade-off of benefit of path versus cost of plan","authors":"Maria-Esther Vidal, L. Raschid, Julián Mestre","doi":"10.1145/1017074.1017091","DOIUrl":"https://doi.org/10.1145/1017074.1017091","url":null,"abstract":"Life sciences sources are characterized by a complex graph of overlapping sources, and multiple alternate links between sources. A (navigational) query may be answered by traversing multiple alternate paths between a start source and a target source. Each of these paths may have dissimilar benefit, e.g., the cardinality of result objects that are reached in the target source. Paths may also have dissimilar costs of evaluation, i.e., the execution cost of a query evaluation plan for a path. In prior research, we developed ESearch, an algorithm based on a Deterministic Finite Automaton (DFA), which exhaustively enumerates all paths to answer a navigational query. The challenge is to develop heuristics that improve on the exhaustive ESearch solution and identify good utility functions that can rank the sources, the links between sources, and the sub-paths that are already visited, in order to quickly produce paths that have the highest benefit and the least cost. In this paper, we present a heuristic that uses local utility functions to rank sources, using either the benefit attributed to the source, the cost of a plan using the source, or both. The heuristic will limit its search to some Top XX% of the ranked sources. To compare ESearch and the heuristic, we construct a Pareto surface of all dominant solutions produced by ESearch, with respect to benefit and cost. We choose the Top 25% of the ESearch solutions that are in the Pareto surface. We compare the paths produced by the heuristic to this Top 25% of ESearch solutions with respect to precision and recall. This motivates the need for further research on developing a more efficient algorithm and better utility functions.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74769226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"One torus to rule them all: multi-dimensional queries in P2P systems","authors":"Prasanna Ganesan, Beverly Yang, H. Garcia-Molina","doi":"10.1145/1017074.1017081","DOIUrl":"https://doi.org/10.1145/1017074.1017081","url":null,"abstract":"Peer-to-peer systems enable access to data spread over an extremely large number of machines. Most P2P systems support only simple lookup queries. However, many new applications, such as P2P photo sharing and massively multi-player games, would benefit greatly from support for multidimensional range queries. We show how such queries may be supported in a P2P system by adapting traditional spatial-database technologies with novel P2P routing networks and load-balancing algorithms. We show how to adapt two popular spatial-database solutions - kd-trees and space-filling curves - and experimentally compare their effectiveness.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79209748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}