{"title":"LSH-Based Probabilistic Pruning of Inverted Indices for Sets and Ranked Lists","authors":"K. Pal, S. Michel","doi":"10.1145/3068839.3068845","DOIUrl":"https://doi.org/10.1145/3068839.3068845","url":null,"abstract":"We address the problem of index pruning without compromising the quality of ad-hoc similarity search among sets and ranked lists. We discuss three different ways to prune the index structure and, by linking the index structure with the concept of Locality Sensitive Hashing (LSH), we introduce two solutions to query processing over the pruned index. Through a probabilistic analysis we ensure that a user-defined recall goal is still guaranteed. We are able to formulate an optimization problem that can determine the optimal pruning factor for all three pruning methods. The experimental evaluations over real-world data validate that the optimal pruning factor indeed ensures the recall goal without any significant effect on the quality of similarity search on a much smaller index.","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116423804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Path Querying Language for Federation of RDF and Relational Database","authors":"Jiahui Zhang, Xiaowang Zhang, Zhiyong Feng","doi":"10.1145/3068839.3068840","DOIUrl":"https://doi.org/10.1145/3068839.3068840","url":null,"abstract":"In this paper, we present a federated path querying language (FPQ) as itself an extension of the nested regular path querying language with adding an axis operator to support the federation of RDF dataset and relational database. We have proven that FPQ has more expressive power than the nested regular path query language (not to mention regular path querying language). It enjoys the same computational complexity as the regular path query language and its additional expressivity can be exactly used to characterize the conjunction and federation of nested regular path queries. Moreover, we discuss the expressivity of various fragments of FPQ and implement FPQ. Finally, we present an application scenario related to the car-pooling services in real life.","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125009902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Role-aware Conformity Influence Analysis in Recommender Systems","authors":"Mengzi Tang, Li Li","doi":"10.1145/3068839.3068846","DOIUrl":"https://doi.org/10.1145/3068839.3068846","url":null,"abstract":"Recommender systems play an important role in providing personalized information to users and helping address the information overload problem. Recent research has considered social theories and studied the importance of social influence in social recommendation systems. However, many publications ignored the users' roles information or just considered some single roles. In fact, users often have many different roles. Besides, different types of users (users with different roles) might have different conformity tendency. Thus, this inspires us to study how conformity tendency changes with users' roles in recommender systems. We firstly formalize conformity influence by defining a utility function and then propose a probabilistic graphical model integrating both users' roles and conformity tendency, named as Role Conformity Recommender Systems (RCRS). We evaluate the proposed model on several real-world datasets. The experimental results show that our model significantly outperforms state-of-the-art approaches.","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131770623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siarhei Bykau, Jihwan Lee, D. Srivastava, Yannis Velegrakis
{"title":"\"Tell me more\" using Ladders in Wikipedia","authors":"Siarhei Bykau, Jihwan Lee, D. Srivastava, Yannis Velegrakis","doi":"10.1145/3068839.3068847","DOIUrl":"https://doi.org/10.1145/3068839.3068847","url":null,"abstract":"We focus on the problem of \"tell me more\" information related to a given fact in Wikipedia. We use the novel notion of role to link information in an infobox with different places in the text of the same Wikipedia page (space) as well as information across different revisions of the page (time). In this way, it is possible to link together pieces of information that may not represent the same real world entity, yet have served in the same role. To achieve this, we introduce a novel structure called ladder that allows such spatial and temporal linking and we show how to effectively and efficiently construct such structures from Wikipedia data.","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121794099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowdsourcing with Diverse Groups of Users","authors":"Sara Cohen, Moran Yashinski","doi":"10.1145/3068839.3068842","DOIUrl":"https://doi.org/10.1145/3068839.3068842","url":null,"abstract":"When crowdsourcing to achieve some goal, or to gather information, there is a distinct advantage to choosing a diverse team of users. Past research has shown the advantages of diversity in the workplace, as team members bring different perspectives and points of view. Similarly, when choosing users from a crowd, user diversity must be taken into consideration. This paper studies the diverse team formation problem. More precisely, we are given a set of required skills, as wells as a large set of people, each of who has some subset of the skills. The goal is to form a team satisfying the skills, that is also diverse, as is reflected by differences in the characteristics of team members (e.g., gender, race, country of residence, economic bracket). We show that finding an optimal (diverse) team of people is an NP-complete problem. In practice, the number of candidates is likely to strongly dominate the number of skills and characteristics. Hence, we provide an algorithm that returns an optimal solution, while running in time that is indifferent to the number of candidates (but is exponential in the number of skills and characteristics). We also provide a polynomial method for approximating optimal team formation by a reduction to the problem of submodular function maximization with a matroid constraint. Extensive experimentation shows both scalability of our methods, and the quality of the solutions returned.","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127919148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling Completeness-aware Querying in SPARQL","authors":"Luis Galárraga, K. Hose, Simon Razniewski","doi":"10.1145/3068839.3068843","DOIUrl":"https://doi.org/10.1145/3068839.3068843","url":null,"abstract":"Current RDF knowledge bases (KBs) are highly incomplete. This incompleteness is a serious problem both for data users and producers. Users do not have guarantees that queries that are run on a KB deliver complete results. Data producers, on the other hand, are blind about the parts of the KB that are incomplete. Yet, completeness information management is poorly supported in the Semantic Web. No RDF storage engine supports reasoning with completeness statements. Moreover, SPARQL cannot express completeness constraints for queries. Motivated by these observations, this paper offers a vision on completeness-aware RDF querying. Our vision includes (1) the sketch of a method to reason about completeness in RDF knowledge bases, (2) two approaches to represent completeness information for SPARQL queries, and (3) an extension for the SPARQL language to express completeness constraints in queries.","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115114738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Invariant Control in Eventually Consistent Databases","authors":"P. Flores, Frank Siqueira","doi":"10.1145/3068839.3068844","DOIUrl":"https://doi.org/10.1145/3068839.3068844","url":null,"abstract":"Due to the requirements imposed by data-intensive applications, NoSQL and NewSQL databases are becoming more present in the IT Market. These products provide alternative data models to the relational databases, and most of them are intrinsically distributed. These database management systems (DBMSs) relax consistency to favor availability and performance. However, applications that use NoSQL/NewSQL databases in distributed environments have to perform consistency control to avoid anomalies, such as inconsistent data and behavior. New approaches suggest the use of replicated data types (RDTs) to control conflicting updates. Another strategy is the use of different consistencies models for each type of operation, using first-order logic and theorem provers to supply the programmer with tools to classify consistency in operations while maintaining system invariants. Notwithstanding, the use of RDT or the descriptions of application integrity constraints in languages using first-order logic is still difficult to be used by programmers. Aiming to simplify the definition of most common database constraints, this paper proposes a mechanism to extract usual integrity constraints in an intermediate model, taking into account the semantics of invariances, using a mix of RDTs and first-order logic. The aim of this paper is to demonstrate how the proposed mechanism simplifies and guarantees safer programming with consistency control being performed at the application level.","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116605269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Vakilian, Yodsawalai Chodpathumwan, Arash Termehchy, A. Nayyeri
{"title":"Cost-Effective Conceptual Design Over Taxonomies","authors":"A. Vakilian, Yodsawalai Chodpathumwan, Arash Termehchy, A. Nayyeri","doi":"10.1145/3068839.3068841","DOIUrl":"https://doi.org/10.1145/3068839.3068841","url":null,"abstract":"It is known that annotating entities in unstructured and semistructured datasets by their concepts improves the effectiveness of answering queries over these datasets. Ideally, one would like to annotate entities of all relevant concepts in a dataset. However, it takes substantial time and computational resources to annotate concepts in large datasets and an organization may have sufficient resources to annotate only a subset of relevant concepts. Clearly, it would like to annotate a subset of concepts that provides the most effective answers to queries over the dataset. We propose a formal framework that quantifies the amount by which annotating entities of concepts from a taxonomy in a dataset improves the effectiveness of answering queries over the dataset. Because the problem is NP-hard, we propose an efficient approximation for the problem. Our extensive empirical studies validate our framework and show the accuracy and efficiency of our algorithm.","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127026893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 20th International Workshop on the Web and Databases","authors":"","doi":"10.1145/3068839","DOIUrl":"https://doi.org/10.1145/3068839","url":null,"abstract":"","PeriodicalId":211805,"journal":{"name":"Proceedings of the 20th International Workshop on the Web and Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129523737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}