Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)最新文献
A. Quamar, Fatma Özcan, Konstantinos Xirogiannopoulos
{"title":"Discovery and Creation of Rich Entities for Knowledge Bases","authors":"A. Quamar, Fatma Özcan, Konstantinos Xirogiannopoulos","doi":"10.1145/3214708.3214712","DOIUrl":"https://doi.org/10.1145/3214708.3214712","url":null,"abstract":"Businesses and professional organizations from a variety of different domains such as finance, weather, healthcare, social networks, etc., produce massive amounts of unstructured, semi-structured and structured data. Knowledge bases, enable querying and analysis of integrated content derived from such data available as open, third party and propriety data sets. Many knowledge bases today, provide an entity-centric view over the integrated content by using domain-specific ontologies. These entity-centric views enable querying individual real-world entities, as well as exploring exact information (such as address or net revenue of a company) through explicit querying using languages such as SQL or SPARQL. Although very useful for many business and commercial applications, this may not be sufficient for the exploration of relevant and context specific information associated with real-world entities stored in these knowledge bases. Users often need to resort to a manual and tedious process of exploration using ad-hoc queries to gather the required information. To enhance user experience and ameliorate the problem of relevant data exploration, we propose the concept of Rich Entities. These rich entities comprise of all the relevant and context specific information grouped together around real-world entities and served as efficient and meaningful responses to user queries against these entities in a knowledge base. These rich entities are created by grouping together information not only from a single entity represented as an ontology concept, but also related concepts and properties as specified by the domain ontology. In this paper we propose several novel techniques and algorithms to automatically detect, learn, and create domain-specific rich entities. We use inputs from query patterns in existing query workloads against knowledge bases, and leverage the structure and relationships between entities defined in the domain ontology. Our techniques are very effective and can be applied to a wide variety of application domains thus adding great value to data exploration and information extraction from entity-centric real-world knowledge bases.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88984579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luca Nanni, Pietro Pinoli, Arif Canakoglu, S. Ceri
{"title":"Exploring Genomic Datasets: from Batch to Interactive and Back","authors":"Luca Nanni, Pietro Pinoli, Arif Canakoglu, S. Ceri","doi":"10.1145/3214708.3214710","DOIUrl":"https://doi.org/10.1145/3214708.3214710","url":null,"abstract":"Genomic data management is focused on achieving high performance over big datasets using batch, cloud-based architectures; this enables the execution of massive pipelines, but hampers the capability of exploring the solution space when it is not well-defined, by choosing different experimental samples or query extraction parameters. We present PyGMQL, a Python-based interoperability software layer that enables testing of experimental pipelines; PyGMQL solves the impedance mismatch between a batch execution environment and the agile programming style of Python, and provides transparency of access when exploration requires integrating local and remote resources. Wrapping PyGMQL and Python primitives within Jupyter notebooks guarantees reproducibility of the pipeline when used in different contexts or by different scientists. The software is freely available at https://github.com/DEIB-GECO/PyGMQL.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88479228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Pros and Cons of Ranked Entities with COMPETE","authors":"Kiril Panev, S. Michel","doi":"10.1145/3214708.3214709","DOIUrl":"https://doi.org/10.1145/3214708.3214709","url":null,"abstract":"We present COMPETE, a novel approach to explore data using rankings. Utilizing rankings, which succinctly summarize the relative performance of entities, is a very intuitive way to inspecting underlying data. In this work we present an approach where users can understand the dominance of entities and interactively explore data. The approach harnesses diverse precomputed rankings, capturing many different aspects of possible user interest. For a given set of input entities, COMPETE identifies entities that are dominating or are dominated by the input, thus expanding on the relative performance of the input and giving focus on other related entities which are always better (or worse) than the input. We consider several aspects of the dominance relationship and provide different approaches that reflect these nuances. We report on the results of an experimental evaluation over data obtained from the Internet Movie Database (IMDb).","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74603229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recommendations for Explorations based on Graphs","authors":"Marialena Kyriakidi, G. Koutrika, Y. Ioannidis","doi":"10.1145/3214708.3214713","DOIUrl":"https://doi.org/10.1145/3214708.3214713","url":null,"abstract":"Recommendations are an integral part of data exploration. Existing approaches, however, consider a limited model of recommendations. In this vision paper, we lay the ground for a graph-based approach for recommendations that allows significant flexibility in capturing both data and recommendations and process them efficiently. We determine the requirements of a desired solution and illustrate the overall idea with an example based on the Yelp dataset.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89837465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis
{"title":"Strategies for Detection of Correlated Data Streams","authors":"Rakan Alseghayer, Daniel Petrov, Panos K. Chrysanthis","doi":"10.1145/3214708.3214714","DOIUrl":"https://doi.org/10.1145/3214708.3214714","url":null,"abstract":"There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. In this paper we present an effective solution for the analysis of such data streams that is based upon a 3-fold approach that combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary recomputations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration strategy that tunes the utility function. Specifically, we propose eight strategies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real dataset shows that some strategies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74660549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaofeng Yang, Mirek Riedewald, Rundong Li, Wolfgang Gatterbauer
{"title":"Any-k Algorithms for Exploratory Analysis with Conjunctive Queries.","authors":"Xiaofeng Yang, Mirek Riedewald, Rundong Li, Wolfgang Gatterbauer","doi":"10.1145/3214708.3214711","DOIUrl":"https://doi.org/10.1145/3214708.3214711","url":null,"abstract":"<p><p>We recently proposed the notion of <i>any-k queries</i>, together with the KARPET algorithm, for tree-pattern search in labeled graphs. Any-<i>k</i> extends top-<i>k</i> by not requiring a pre-specified value of <i>k</i>. Instead, an any-<i>k</i> algorithm returns as many of the top-ranked results as possible, for a given time budget. Given additional time, it produces the next-highest ranked results quickly as well. It can be stopped anytime, but may have to continue until all results are returned. In the latter case, any-k takes times similar to an algorithm that first produces all results and then sorts them. We summarize KARPET and argue that it can be extended to support any-<i>k</i> exploratory search for arbitrary conjunctive queries.</p>","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3214708.3214711","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39258977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Senjuti Basu Roy, K. Stefanidis, G. Koutrika, Mirek Riedewald, L. Lakshmanan
{"title":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web","authors":"Senjuti Basu Roy, K. Stefanidis, G. Koutrika, Mirek Riedewald, L. Lakshmanan","doi":"10.1145/3214708","DOIUrl":"https://doi.org/10.1145/3214708","url":null,"abstract":"The purpose of the ExploreDB workshop is to bring together researchers and practitioners that approach data exploration from different angles, ranging from data management, information retrieval to data visualization and human computer interaction, in order to study the emerging needs and objectives for data exploration, as well as the challenges and problems that need to be tackled.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87416916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wan D. Bae, Shayma Alkobaisi, S. H. Kim, Sada Narayanappa, C. Shahabi
{"title":"Supporting Range Queries on Web Data Using k-Nearest Neighbor Search","authors":"Wan D. Bae, Shayma Alkobaisi, S. H. Kim, Sada Narayanappa, C. Shahabi","doi":"10.1007/978-3-540-76925-5_5","DOIUrl":"https://doi.org/10.1007/978-3-540-76925-5_5","url":null,"abstract":"","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88311975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spam, damn spam, and statistics: using statistical analysis to locate spam web pages","authors":"Dennis Fetterly, M. Manasse, Marc Najork","doi":"10.1145/1017074.1017077","DOIUrl":"https://doi.org/10.1145/1017074.1017077","url":null,"abstract":"The increasing importance of search engines to commercial web sites has given rise to a phenomenon we call \"web spam\", that is, web pages that exist only to mislead search engines into (mis)leading users to certain web sites. Web spam is a nuisance to users as well as search engines: users have a harder time finding the information they need, and search engines have to cope with an inflated corpus, which in turn causes their cost per query to increase. Therefore, search engines have a strong incentive to weed out spam web pages from their index.We propose that some spam web pages can be identified through statistical analysis: Certain classes of spam pages, in particular those that are machine-generated, diverge in some of their properties from the properties of web pages at large. We have examined a variety of such properties, including linkage structure, page content, and page evolution, and have found that outliers in the statistical distribution of these properties are highly likely to be caused by web spam.This paper describes the properties we have examined, gives the statistical distributions we have observed, and shows which kinds of outliers are highly correlated with web spam.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72892866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Twig query processing over graph-structured XML data","authors":"Zografoula Vagena, Mirella M. Moro, V. Tsotras","doi":"10.1145/1017074.1017087","DOIUrl":"https://doi.org/10.1145/1017074.1017087","url":null,"abstract":"XML and semi-structured data is usually modeled using graph structures. Structural summaries, which have been proposed to speedup XML query processing have graph forms as well. The existent approaches for evaluating queries over tree structured data (i.e. data whose underlying structure is a tree) are not directly applicable when the data is modeled as a random graph. Moreover, they cannot be applied when structural summaries are employed and, to the best of our knowledge, no analogous techniques have been reported for this case either. As a result, the potential of structural summaries is not fully exploited.In this paper, we investigate query evaluation techniques applicable to graph-structured data. We propose efficient algorithms for the case of directed acyclic graphs, which appear in many real world situations. We then tailor our approaches to handle other directed graphs as well. Our experimental evaluation reveals the advantages of our solutions over existing methods for graph-structured data.","PeriodicalId":93360,"journal":{"name":"Proceedings of the 5th International Workshop on Exploratory Search in Databases and the Web. International Workshop on Exploratory Search in Databases and the Web (5th : 2018 : Houston, Tex.)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85950207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}