Proceedings of the 18th International Workshop on Web and Databases最新文献

Truth Finding with Attribute Partitioning 属性划分的真值发现

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2767118

M. Ba, Roxana Horincar, P. Senellart, Huayu Wu

引用次数: 8

Discovering Subsumption Relationships for Web-Based Ontologies 发现基于web的本体的包容关系

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2767111

Dana Movshovitz-Attias, Steven Euijong Whang, Natasha Noy, A. Halevy

{"title":"Discovering Subsumption Relationships for Web-Based Ontologies","authors":"Dana Movshovitz-Attias, Steven Euijong Whang, Natasha Noy, A. Halevy","doi":"10.1145/2767109.2767111","DOIUrl":"https://doi.org/10.1145/2767109.2767111","url":null,"abstract":"As search engines are becoming smarter at interpreting user queries and providing meaningful responses, they rely on ontologies to understand the meaning of entities. Creating ontologies manually is a laborious process, and resulting ontologies may not reflect the way users think about the world, as many concepts used in queries are noisy, and not easily amenable to formal modeling. There has been considerable effort in generating ontologies from Web text and query streams, which may be more reflective of how users query and write content. In this paper, we describe the LATTE system that automatically generates a subconcept--superconcept hierarchy, which is critical for using ontologies to answer queries. LATTE combines signals based on word-vector representations of concepts and dependency parse trees; however, LATTE derives most of its power from an ontology of attributes extracted from the Web that indicates the aspects of concepts that users find important. LATTE achieves an F1 score of 74%, which is comparable to expert agreement on a similar task. We additionally demonstrate the usefulness of LATTE in detecting high quality concepts from an existing resource of IsA links.","PeriodicalId":316270,"journal":{"name":"Proceedings of the 18th International Workshop on Web and Databases","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122755074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Addressing Instance Ambiguity in Web Harvesting 解决Web收集中的实例歧义

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2767114

Zhixu Li, Xiangliang Zhang, Hai Huang, Qing Xie, Jia Zhu, Xiaofang Zhou

{"title":"Addressing Instance Ambiguity in Web Harvesting","authors":"Zhixu Li, Xiangliang Zhang, Hai Huang, Qing Xie, Jia Zhu, Xiaofang Zhou","doi":"10.1145/2767109.2767114","DOIUrl":"https://doi.org/10.1145/2767109.2767114","url":null,"abstract":"Web Harvesting enables the enrichment of incomplete data sets by retrieving required information from the Web. However, the ambiguity of instances may greatly decrease the quality of the harvested data, given that any instance in the local data set may become ambiguous when attempting to identify it on the Web. Although plenty of disambiguation methods have been proposed to deal with the ambiguity problems in various settings, none of them are able to handle the instance ambiguity problem in Web Harvesting. In this paper, we propose to do instance disambiguation in Web Harvesting with a novel disambiguation method inspired by the idea of collaborative identity recognition. In particular, we expect to find some common properties in forms of latent shared attribute values among instances in the list, such that these shared attribute values can differentiate instances within the list against those ambiguous ones on the Web. Our extensive experimental evaluation illustrates the utility of collaborative disambiguation for a popular Web Harvesting application, and shows that it substantially improves the accuracy of the harvested data.","PeriodicalId":316270,"journal":{"name":"Proceedings of the 18th International Workshop on Web and Databases","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133406497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Long-term Optimization of Update Frequencies for Decaying Information 衰减信息更新频率的长期优化

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2767113

Simon Razniewski, W. Nutt

{"title":"Long-term Optimization of Update Frequencies for Decaying Information","authors":"Simon Razniewski, W. Nutt","doi":"10.1145/2767109.2767113","DOIUrl":"https://doi.org/10.1145/2767109.2767113","url":null,"abstract":"Many kinds of information, such as addresses, crawls of webpages, or academic affiliations, are prone to becoming outdated over time. Therefore, in some applications, updates are performed periodically in order to keep the correctness and usefulness of such information high. As refreshing information usually has a cost, e.g. computation time, network bandwidth or human work time, a problem is to find the right update frequency depending on the benefit gained from the information and on the speed with which the information is expected to get outdated. This is especially important since often entities exhibit a different speed of getting outdated, as, e.g., addresses of students change more frequently than addresses of pensionists, or news portals change more frequently than personal homepages. Thus, there is no uniform best update frequency for all entities. Previous work [5] on data freshness has focused on the question of how to best distribute a fixed budget for updates among various entities, which is of interest in the short-term, when resources are fixed and cannot be adjusted. In the long-term, many businesses are able to adjust their resources in order to optimize their gain. Then, the problem is not one of distributing a fixed number of updates but one of determining the frequency of updates that optimizes the overall gain from the information. In this paper, we investigate how the optimal update frequency for decaying information can be determined. We show that the optimal update frequency is independent for each entity, and how simple iteration can be used to find the optimal update frequency. An implementation of our solution for exponential decay is available online.","PeriodicalId":316270,"journal":{"name":"Proceedings of the 18th International Workshop on Web and Databases","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125215544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FOREST: Focused Object Retrieval by Exploiting Significant Tag Paths FOREST:利用显著标记路径的聚焦对象检索

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2767112

Marilena Oita, P. Senellart

引用次数: 1

TriAL-QL: Distributed Processing of Navigational Queries TriAL-QL:导航查询的分布式处理

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2767115

Martin Przyjaciel-Zablocki, A. Schätzle, Adriano Lange

{"title":"TriAL-QL: Distributed Processing of Navigational Queries","authors":"Martin Przyjaciel-Zablocki, A. Schätzle, Adriano Lange","doi":"10.1145/2767109.2767115","DOIUrl":"https://doi.org/10.1145/2767109.2767115","url":null,"abstract":"Navigational queries are among the most natural query patterns for RDF data, but yet most existing RDF query languages fail to cover all the varieties inherent to its triple-based model, including SPARQL 1.1 and its derivatives. As a consequence, the development of more expressive RDF languages is of general interest. With TriAL* [14], there exists an expressive algebra which subsumes many previous approaches, while adding novel features that are not expressible in most other RDF query languages based on the standard graph model. However, its algebraic notation is inappropriate for practical usage and it is not supported by any existing RDF triple store. In this paper, we propose TriAL-QL, an easy to write and grasp language for TriAL*, preserving its compositional algebraic structure. We present an implementation based on Impala, a massive parallel SQL query engine on Hadoop, using an optimized semi-naive evaluation for the recursive fragments of TriAL*. This way, we support both data-intensive ETL-like workloads and explorative ad-hoc style queries. To demonstrate the scalability and expressiveness of our approach, we conducted experiments on generated social networks with up to 1.8 billion triples and compared different execution strategies to a Hive-based solution.","PeriodicalId":316270,"journal":{"name":"Proceedings of the 18th International Workshop on Web and Databases","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130722328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Person-Name Parsing for Linking User Web Profiles 链接用户Web配置文件的人名解析

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2767117

G. Das, Xiang Li, Ang Sun, Hakan Kardes, Xin Wang

{"title":"Person-Name Parsing for Linking User Web Profiles","authors":"G. Das, Xiang Li, Ang Sun, Hakan Kardes, Xin Wang","doi":"10.1145/2767109.2767117","DOIUrl":"https://doi.org/10.1145/2767109.2767117","url":null,"abstract":"A person-name parser involves the identification of constituent parts of a person's name. Due to multiple writing styles (\"John Smith\" versus \"Smith, John\"), extra information (\"John Smith, PhD\", \"Rev. John Smith\"), and country-specific last-name prefixes (\"Jean van de Velde\"), parsing fullname strings from user profiles on Web 2.0 applications is not straightforward. To the best of our knowledge, we are the first to address this problem systematically by proposing machine learning approaches for parsing noisy fullname strings. In this paper, we propose several types of features based on token statistics, surface-patterns, and specialized dictionaries and apply them within a sequence modeling framework to learn a fullname parser. In particular, we propose the use of \"bucket\" features based on (name-token, label) distributions in lieu of \"term\" features frequently used in various Natural Language Processing applications to prevent the growth of learning parameters as a function of the training data size. We experimentally illustrate the generalizability, effectiveness, and efficiency aspects of our proposed features for noisy fullname parsing on fullname strings from the popular, professional networking website LinkedIn and commonly-used person names in the United States. On these datasets, our fullname parser significantly outperforms both the parser trained using classification approaches and a commercially-available name parsing solution.","PeriodicalId":316270,"journal":{"name":"Proceedings of the 18th International Workshop on Web and Databases","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133724461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Analyzing Crowd Rankings 人群排名分析

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2767110

Julia Stoyanovich, Marie Jacob, Xuemei Gong

引用次数: 14

The elephant in the room: getting value from Big Data 房间里的大象:从大数据中获取价值

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-31 DOI: 10.1145/2767109.2770014

S. Abiteboul, X. Dong, Oren Etzioni, D. Srivastava, G. Weikum, Julia Stoyanovich, Fabian M. Suchanek

引用次数: 12

IBEX: Harvesting Entities from the Web Using Unique Identifiers IBEX:使用唯一标识符从Web中获取实体

Proceedings of the 18th International Workshop on Web and Databases Pub Date : 2015-05-04 DOI: 10.1145/2767109.2767116

Aliaksandr Talaika, J. Biega, Antoine Amarilli, Fabian M. Suchanek

引用次数: 16