{"title":"CrowdLink: An Error-Tolerant Model for Linking Complex Records","authors":"C. Zhang, Rui Meng, Lei Chen, Feida Zhu","doi":"10.1145/2795218.2795222","DOIUrl":"https://doi.org/10.1145/2795218.2795222","url":null,"abstract":"Record linkage (RL) refers to the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, databases), which is a long-standing challenge in database management. Algorithmic approaches have been proposed to improve RL quality, but remain far from perfect. Crowdsourcing offers a more accurate but expensive (and slow) way to bring human insight into the process. In this paper, we propose a new probabilistic model, namely CrowdLink, to tackle the above limitations. In particular, our model gracefully handles the crowd error and the correlation among different pairs, as well as enables us to decompose the records into small pieces (i.e. attributes) so that crowdsourcing workers can easily verify. Further, we develop efficient and effective algorithms to select the most valuable questions, in order to reduce the monetary cost of crowdsourcing. We conducted extensive experiments on both synthetic and real-world datasets. The experimental results verified the effectiveness and the applicability of our model.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131719745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyu Ge, Panos K. Chrysanthis, Alexandros Labrinidis
{"title":"Preferential Diversity","authors":"Xiaoyu Ge, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1145/2795218.2795224","DOIUrl":"https://doi.org/10.1145/2795218.2795224","url":null,"abstract":"The ever increasing supply of data is bringing a renewed attention to query personalization. Query personalization is a technique that utilizes user preferences with the goal of providing relevant results to the users. Along with preferences, diversity is another important aspect of query personalization especially useful during data exploration. The goal of result diversification is to reduce the amount of redundant information included in the results. Most previous approaches of result diversification focus solely on generating the most diverse results, which do not take user preferences into account. In this paper, we propose a novel framework called Preferential Diversity (PrefDiv) that aims to support both relevancy and diversity of user query results. PrefDiv utilizes user preference models that return ranked results and reduces the redundancy of results in an efficient and flexible way. PrefDiv maintains the balance between relevancy and diversity of the query results by providing users with the ability to control the trade-off between the two. We describe an implementation of PrefDiv on top of the HYPRE preference model, which allows users to specify both qualitative and quantitative preferences and unifies them using the concept of preference intensities. We experimentally evaluate its performance by comparing with state-of-the-art diversification techniques; our results indicate that PrefDiv achieves significantly better balance between diversity and relevance.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128370458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Method of Complex Event Processing over XML Streams","authors":"Tatsuki Matsuda, Yuki Uchida, Satoru Fujita","doi":"10.1145/2795218.2795220","DOIUrl":"https://doi.org/10.1145/2795218.2795220","url":null,"abstract":"This paper describes a query processing engine for multiple continuous XML data streams with correlated data as a notification mechanism for navigating data exploration. Stream processing, including formal models for stream filtering, union, activation, decomposition, and partition, is formulated in algebraic expressions. In addition, a query language, called QLMXS, over XML streams for complex event processing is described. QLMXS supports all functions of the algebraic expressions in a SQL-like form. QLMXS queries are converted into a visibly pushdown automaton (VPA) that analyzes complex event data from the XML streams. The VPA engine concurrently processes multiple XML data on multiple levels; therefore, it is very important to tune the performance of the engine. Four optimization methods are proposed to improve performance by utilizing VPA and XML features: VPA-state reduction, VPA unification, delayed evaluation, and elimination of unnecessary XML processing. Experimental results demonstrate that VPA unification increases the processing speed of the VPA engine 1.6 times, and the overall processing speed is increased 2.6 times.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128381011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Principled Optimization Frameworks for Query Reformulation of Database Queries","authors":"Gautam Das","doi":"10.1145/2795218.2795227","DOIUrl":"https://doi.org/10.1145/2795218.2795227","url":null,"abstract":"Traditional databases have traditionally supported the Boolean retrieval model, where a query returns all tuples that match the selection conditions specified -- no more and no less. Such a query model is often inconvenient for naive users conducting searches that are often exploratory in nature, since the user may not have a complete idea, or a firm opinion of what she may be looking for. This is especially relevant in the context of the Deep Web, which offers a plethora of searchable data sources such as electronic products, transportation choices, apparel, investment options, etc. Users often encounter two types of problems: (a) they may under-specify the items of interest, and find too many items satisfying the given conditions (the many answers problem), or (b) they may over-specify the items of interest, and find no item in the source satisfying all the provided conditions (the empty answer problem). In this talk, I discuss our recent efforts in developing techniques for iterative \"query reformulation\" by which the system guides the user in a systematic way through several small steps, where each step suggests slight query modifications, until the query reaches a form that generates desirable answers. Our proposed approaches for suggesting query reformulations are driven by novel probabilistic frameworks based on optimizing a wide variety of application-dependent objective functions.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121684172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roxana Gheorghiu, Alexandros Labrinidis, Panos K. Chrysanthis
{"title":"Unifying Qualitative and Quantitative Database Preferences to Enhance Query Personalization","authors":"Roxana Gheorghiu, Alexandros Labrinidis, Panos K. Chrysanthis","doi":"10.1145/2795218.2795223","DOIUrl":"https://doi.org/10.1145/2795218.2795223","url":null,"abstract":"Query personalization can be an effective technique in dealing with the data scalability challenge, primarily from the human point of view, i.e., making big data easier to use. In order to customize their query results, users need to express their preferences in a simple and user-friendly manner. In this paper, we present a graph-based theoretical framework and a prototype system that unify qualitative and quantitative preferences, while eliminating their disadvantages. Our integrated system allows for (1) the specification of database preferences and the creation of user preference profiles in a user-friendly manner, (2) the manipulation of preferences of individuals or groups of users and (3) total ordering of the tuples in the database, matching both qualitative and quantitative preferences, hence significantly increasing the number of tuples covered by the user preferences. We confirmed the latter experimentally by comparing our preference selection algorithm with Fagin's TA algorithm.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133374309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. M. Megler, D. Maier, D. Bottomly, Libbey White, S. McWeeney, B. Wilmot
{"title":"Data Like This: Ranked Search of Genomic Data Vision Paper","authors":"V. M. Megler, D. Maier, D. Bottomly, Libbey White, S. McWeeney, B. Wilmot","doi":"10.1145/2795218.2795221","DOIUrl":"https://doi.org/10.1145/2795218.2795221","url":null,"abstract":"High-throughput genetic sequencing produces the ultimate \"big data\": a human genome sequence contains more than 3B base pairs, and more and more characteristics, or annotations, are being recorded at the base-pair level. Locating areas of interest within the genome is a challenge for researchers, limiting their investigations. We describe our vision of adapting \"big data\" ranked search to the problem of searching the genome. Our goal is to make searching for data as easy for scientists as searching the Internet.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115954252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explore-By-Example: A New Database Service for Interactive Data Exploration","authors":"Y. Diao","doi":"10.1145/2795218.2795226","DOIUrl":"https://doi.org/10.1145/2795218.2795226","url":null,"abstract":"Traditional DBMSs are suited for applications in which the structure, meaning and contents of the database, as well as the questions (queries) to be asked, are all well-understood. However, this is no longer true when the volume and diversity of data grow at an unprecedented rate, while the user ability to comprehend data remains (as limited) as before. To address the increasing disparity in the \"big data - same humans\" problem, our project explores a new approach of system-aided exploration of a big data space and automatic learning of the user interest in order to retrieve all objects that match the user interest -- we call this new service \"interactive data exploration\", which complements the traditional querying interface of a database system. In this talk, I introduce a new framework for interactive data exploration, called \"Explore-by-Example\", which iteratively seeks user relevance feedback on database samples and uses such feedback to finally predict a query that retrieves all objects of interest to the user. The goal is to make such exploration converge fast to the true user interest model, while minimizing the user labeling effort and providing interactive performance in each iteration. I discuss a range of techniques and optimizations to do so for linear patterns and complex non-linear patterns. Our user study indicates that our approach can significantly reduce the user effort and the total exploration time, compared with the common practice of manual exploration. I finally conclude the talk by pointing out a host of new challenges, ranging from application of active learning theory, to database optimizations, to visualization.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128971382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diversifying with Few Regrets, But too Few to Mention","authors":"Zaeem Hussain, Hina A. Khan, M. Sharaf","doi":"10.1145/2795218.2795225","DOIUrl":"https://doi.org/10.1145/2795218.2795225","url":null,"abstract":"Representative data provide users with a concise overview of their potentially large query results. Recently, diversity maximization has been adopted as one technique to generate representative data with high coverage and low redundancy. Orthogonally, regret minimization has emerged as another technique to generate representative data with high utility that satisfy the user's preference. In reality, however, users typically have some pre-specified preferences over some dimensions of the data, while expecting good coverage over the other dimensions. Motivated by that need, in this work we propose a novel scheme called ReDi, which aims to generate representative data that balance the tradeoff between regret minimization and diversity maximization. ReDi is based on a hybrid objective function that combines both regret and diversity. Additionally, it employs several algorithms that are designed to maximize that objective function. We perform extensive experimental evaluation to measure the tradeoff between the effectiveness and efficiency provided by the different ReDi algorithms.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121237520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","authors":"","doi":"10.1145/2795218","DOIUrl":"https://doi.org/10.1145/2795218","url":null,"abstract":"","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116468460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}