Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web最新文献

筛选
英文 中文
CrowdLink: An Error-Tolerant Model for Linking Complex Records CrowdLink:链接复杂记录的容错模型
C. Zhang, Rui Meng, Lei Chen, Feida Zhu
{"title":"CrowdLink: An Error-Tolerant Model for Linking Complex Records","authors":"C. Zhang, Rui Meng, Lei Chen, Feida Zhu","doi":"10.1145/2795218.2795222","DOIUrl":"https://doi.org/10.1145/2795218.2795222","url":null,"abstract":"Record linkage (RL) refers to the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, databases), which is a long-standing challenge in database management. Algorithmic approaches have been proposed to improve RL quality, but remain far from perfect. Crowdsourcing offers a more accurate but expensive (and slow) way to bring human insight into the process. In this paper, we propose a new probabilistic model, namely CrowdLink, to tackle the above limitations. In particular, our model gracefully handles the crowd error and the correlation among different pairs, as well as enables us to decompose the records into small pieces (i.e. attributes) so that crowdsourcing workers can easily verify. Further, we develop efficient and effective algorithms to select the most valuable questions, in order to reduce the monetary cost of crowdsourcing. We conducted extensive experiments on both synthetic and real-world datasets. The experimental results verified the effectiveness and the applicability of our model.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131719745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Preferential Diversity 优惠的多样性
Xiaoyu Ge, Panos K. Chrysanthis, Alexandros Labrinidis
{"title":"Preferential Diversity","authors":"Xiaoyu Ge, Panos K. Chrysanthis, Alexandros Labrinidis","doi":"10.1145/2795218.2795224","DOIUrl":"https://doi.org/10.1145/2795218.2795224","url":null,"abstract":"The ever increasing supply of data is bringing a renewed attention to query personalization. Query personalization is a technique that utilizes user preferences with the goal of providing relevant results to the users. Along with preferences, diversity is another important aspect of query personalization especially useful during data exploration. The goal of result diversification is to reduce the amount of redundant information included in the results. Most previous approaches of result diversification focus solely on generating the most diverse results, which do not take user preferences into account. In this paper, we propose a novel framework called Preferential Diversity (PrefDiv) that aims to support both relevancy and diversity of user query results. PrefDiv utilizes user preference models that return ranked results and reduces the redundancy of results in an efficient and flexible way. PrefDiv maintains the balance between relevancy and diversity of the query results by providing users with the ability to control the trade-off between the two. We describe an implementation of PrefDiv on top of the HYPRE preference model, which allows users to specify both qualitative and quantitative preferences and unifies them using the concept of preference intensities. We experimentally evaluate its performance by comparing with state-of-the-art diversification techniques; our results indicate that PrefDiv achieves significantly better balance between diversity and relevance.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128370458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Method of Complex Event Processing over XML Streams XML流上的复杂事件处理方法
Tatsuki Matsuda, Yuki Uchida, Satoru Fujita
{"title":"Method of Complex Event Processing over XML Streams","authors":"Tatsuki Matsuda, Yuki Uchida, Satoru Fujita","doi":"10.1145/2795218.2795220","DOIUrl":"https://doi.org/10.1145/2795218.2795220","url":null,"abstract":"This paper describes a query processing engine for multiple continuous XML data streams with correlated data as a notification mechanism for navigating data exploration. Stream processing, including formal models for stream filtering, union, activation, decomposition, and partition, is formulated in algebraic expressions. In addition, a query language, called QLMXS, over XML streams for complex event processing is described. QLMXS supports all functions of the algebraic expressions in a SQL-like form. QLMXS queries are converted into a visibly pushdown automaton (VPA) that analyzes complex event data from the XML streams. The VPA engine concurrently processes multiple XML data on multiple levels; therefore, it is very important to tune the performance of the engine. Four optimization methods are proposed to improve performance by utilizing VPA and XML features: VPA-state reduction, VPA unification, delayed evaluation, and elimination of unnecessary XML processing. Experimental results demonstrate that VPA unification increases the processing speed of the VPA engine 1.6 times, and the overall processing speed is increased 2.6 times.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128381011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principled Optimization Frameworks for Query Reformulation of Database Queries 数据库查询重构的原则优化框架
Gautam Das
{"title":"Principled Optimization Frameworks for Query Reformulation of Database Queries","authors":"Gautam Das","doi":"10.1145/2795218.2795227","DOIUrl":"https://doi.org/10.1145/2795218.2795227","url":null,"abstract":"Traditional databases have traditionally supported the Boolean retrieval model, where a query returns all tuples that match the selection conditions specified -- no more and no less. Such a query model is often inconvenient for naive users conducting searches that are often exploratory in nature, since the user may not have a complete idea, or a firm opinion of what she may be looking for. This is especially relevant in the context of the Deep Web, which offers a plethora of searchable data sources such as electronic products, transportation choices, apparel, investment options, etc. Users often encounter two types of problems: (a) they may under-specify the items of interest, and find too many items satisfying the given conditions (the many answers problem), or (b) they may over-specify the items of interest, and find no item in the source satisfying all the provided conditions (the empty answer problem). In this talk, I discuss our recent efforts in developing techniques for iterative \"query reformulation\" by which the system guides the user in a systematic way through several small steps, where each step suggests slight query modifications, until the query reaches a form that generates desirable answers. Our proposed approaches for suggesting query reformulations are driven by novel probabilistic frameworks based on optimizing a wide variety of application-dependent objective functions.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121684172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unifying Qualitative and Quantitative Database Preferences to Enhance Query Personalization 统一定性和定量数据库偏好,增强查询个性化
Roxana Gheorghiu, Alexandros Labrinidis, Panos K. Chrysanthis
{"title":"Unifying Qualitative and Quantitative Database Preferences to Enhance Query Personalization","authors":"Roxana Gheorghiu, Alexandros Labrinidis, Panos K. Chrysanthis","doi":"10.1145/2795218.2795223","DOIUrl":"https://doi.org/10.1145/2795218.2795223","url":null,"abstract":"Query personalization can be an effective technique in dealing with the data scalability challenge, primarily from the human point of view, i.e., making big data easier to use. In order to customize their query results, users need to express their preferences in a simple and user-friendly manner. In this paper, we present a graph-based theoretical framework and a prototype system that unify qualitative and quantitative preferences, while eliminating their disadvantages. Our integrated system allows for (1) the specification of database preferences and the creation of user preference profiles in a user-friendly manner, (2) the manipulation of preferences of individuals or groups of users and (3) total ordering of the tuples in the database, matching both qualitative and quantitative preferences, hence significantly increasing the number of tuples covered by the user preferences. We confirmed the latter experimentally by comparing our preference selection algorithm with Fagin's TA algorithm.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133374309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data Like This: Ranked Search of Genomic Data Vision Paper 像这样的数据:基因组数据视觉论文的排名搜索
V. M. Megler, D. Maier, D. Bottomly, Libbey White, S. McWeeney, B. Wilmot
{"title":"Data Like This: Ranked Search of Genomic Data Vision Paper","authors":"V. M. Megler, D. Maier, D. Bottomly, Libbey White, S. McWeeney, B. Wilmot","doi":"10.1145/2795218.2795221","DOIUrl":"https://doi.org/10.1145/2795218.2795221","url":null,"abstract":"High-throughput genetic sequencing produces the ultimate \"big data\": a human genome sequence contains more than 3B base pairs, and more and more characteristics, or annotations, are being recorded at the base-pair level. Locating areas of interest within the genome is a challenge for researchers, limiting their investigations. We describe our vision of adapting \"big data\" ranked search to the problem of searching the genome. Our goal is to make searching for data as easy for scientists as searching the Internet.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115954252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Explore-By-Example: A New Database Service for Interactive Data Exploration 范例探索:交互式数据探索的新数据库服务
Y. Diao
{"title":"Explore-By-Example: A New Database Service for Interactive Data Exploration","authors":"Y. Diao","doi":"10.1145/2795218.2795226","DOIUrl":"https://doi.org/10.1145/2795218.2795226","url":null,"abstract":"Traditional DBMSs are suited for applications in which the structure, meaning and contents of the database, as well as the questions (queries) to be asked, are all well-understood. However, this is no longer true when the volume and diversity of data grow at an unprecedented rate, while the user ability to comprehend data remains (as limited) as before. To address the increasing disparity in the \"big data - same humans\" problem, our project explores a new approach of system-aided exploration of a big data space and automatic learning of the user interest in order to retrieve all objects that match the user interest -- we call this new service \"interactive data exploration\", which complements the traditional querying interface of a database system. In this talk, I introduce a new framework for interactive data exploration, called \"Explore-by-Example\", which iteratively seeks user relevance feedback on database samples and uses such feedback to finally predict a query that retrieves all objects of interest to the user. The goal is to make such exploration converge fast to the true user interest model, while minimizing the user labeling effort and providing interactive performance in each iteration. I discuss a range of techniques and optimizations to do so for linear patterns and complex non-linear patterns. Our user study indicates that our approach can significantly reduce the user effort and the total exploration time, compared with the common practice of manual exploration. I finally conclude the talk by pointing out a host of new challenges, ranging from application of active learning theory, to database optimizations, to visualization.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128971382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diversifying with Few Regrets, But too Few to Mention 没有遗憾的多样化,但太少了
Zaeem Hussain, Hina A. Khan, M. Sharaf
{"title":"Diversifying with Few Regrets, But too Few to Mention","authors":"Zaeem Hussain, Hina A. Khan, M. Sharaf","doi":"10.1145/2795218.2795225","DOIUrl":"https://doi.org/10.1145/2795218.2795225","url":null,"abstract":"Representative data provide users with a concise overview of their potentially large query results. Recently, diversity maximization has been adopted as one technique to generate representative data with high coverage and low redundancy. Orthogonally, regret minimization has emerged as another technique to generate representative data with high utility that satisfy the user's preference. In reality, however, users typically have some pre-specified preferences over some dimensions of the data, while expecting good coverage over the other dimensions. Motivated by that need, in this work we propose a novel scheme called ReDi, which aims to generate representative data that balance the tradeoff between regret minimization and diversity maximization. ReDi is based on a hybrid objective function that combines both regret and diversity. Additionally, it employs several algorithms that are designed to maximize that objective function. We perform extensive experimental evaluation to measure the tradeoff between the effectiveness and efficiency provided by the different ReDi algorithms.","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121237520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web 第二届数据库和网络探索性搜索国际研讨会论文集
{"title":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","authors":"","doi":"10.1145/2795218","DOIUrl":"https://doi.org/10.1145/2795218","url":null,"abstract":"","PeriodicalId":211132,"journal":{"name":"Proceedings of the Second International Workshop on Exploratory Search in Databases and the Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116468460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信