Web-KR '14最新文献

JTOWL: A JSON to OWL Converto JTOWL: JSON到OWL的转换

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663801

Y. Yao, R. Wu, Hui Liu

引用次数: 8

Novel Query Suggestions: Initial Work Report 新颖的查询建议:初步工作报告

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663799

I. Nawrot, Oskar Gross, A. Doucet, Hannu (TT) Toivonen

{"title":"Novel Query Suggestions: Initial Work Report","authors":"I. Nawrot, Oskar Gross, A. Doucet, Hannu (TT) Toivonen","doi":"10.1145/2663792.2663799","DOIUrl":"https://doi.org/10.1145/2663792.2663799","url":null,"abstract":"Query auto-completion (QAC) is one of the most recognizable and widely used services of modern search engines. Its goal is to assist a user in the process of query formulation. Current QAC systems are mainly reactive. They respond to the present request using past knowledge. Specifically, they mostly rely on query logs analysis or corpus terms co-occurrences and rank suggestions according to their similarity with the partial user query, their past popularity, or their temporal dynamics features (e.g. trends, bursts, seasonality in query popularity). Consequently, a suggestion to be recommended by the QAC system must be preceded with a substantial users' interest and ipso facto must be an old information. However, a growing amount of people turns to search engines to find novel information, that is emergent or recently created (not redundant) one. Conventional QAC systems are thus unable to fulfill the increasingly real-time needs of the users.\u0000 In this work-in-progress report, we introduce a new approach to QAC - the system filtering out potentially novel information and proactively delivering it to the users. It aims at providing the users with some novel insight. Thus, it caters for their open-ended or persistent and increasingly real-time information needs. The preliminary method proposed in this paper to evaluate this approach forms time specific suggestions based on a comparison of two corpora constantly being updated with new data from chosen sources. An unsupervised and language-independent algorithm relying on relative novelty of terms co-occurrences is used to generate suggestions. The initial experimental results demonstrate the effectiveness of the approach in recommending queries leading to novel information. Therefore, they prove that such a system can enhance the exploratory power of a search engine and support the proactive information search.","PeriodicalId":289794,"journal":{"name":"Web-KR '14","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115034692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Enabling Social Search in Time through Graphs 通过图表实现社交搜索

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663802

K. Stefanidis, Georgia Koloniari

引用次数: 8

A Study on the CBOW Model's Overfitting and Stability CBOW模型的过拟合及稳定性研究

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663793

Qun Luo, Weiran Xu, Jun Guo

引用次数: 25

Learning the Mapping Rules for Sentiment Analysis 学习情感分析的映射规则

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663796

Saravadee Sae Tan, Lay-Ki Soon, T. Lim, E. Tang, C. Loo

引用次数: 3

Semantic Exploration of Sensor Data 传感器数据的语义探索

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663800

Snehasish Banerjee, Abhishek Mishra, R. Dasgupta

引用次数: 5

Learning to Match Heterogeneous Structures using Partially Labeled Data 学习使用部分标记数据匹配异构结构

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663797

Saravadee Sae Tan, T. Lim, Lay-Ki Soon, E. Tang

引用次数: 2

Clustering and Labeling a Web Scale Document Collection using Wikipedia clusters 使用维基百科聚类对Web规模文档集合进行聚类和标记

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663803

R. Nayak, Rachel Mills, R. D. Vries, S. Geva

{"title":"Clustering and Labeling a Web Scale Document Collection using Wikipedia clusters","authors":"R. Nayak, Rachel Mills, R. D. Vries, S. Geva","doi":"10.1145/2663792.2663803","DOIUrl":"https://doi.org/10.1145/2663792.2663803","url":null,"abstract":"Clustering is an important technique in organising and categorising web scale documents. The main challenges faced in clustering the billions of documents available on the web are the processing power required and the sheer size of the datasets available. More importantly, it is nigh impossible to generate the labels for a general web document collection containing billions of documents and a vast taxonomy of topics. However, document clusters are most commonly evaluated by comparison to a ground truth set of labels for documents. This paper presents a clustering and labeling solution where the Wikipedia is clustered and hundreds of millions of web documents in ClueWeb12 are mapped on to those clusters. This solution is based on the assumption that the Wikipedia contains such a wide range of diverse topics that it represents a small scale web. We found that it was possible to perform the web scale document clustering and labeling process on one desktop computer under a couple of days for the Wikipedia clustering solution containing about 1000 clusters. It takes longer to execute a solution with finer granularity clusters such as 10,000 or 50,000. These results were evaluated using a set of external data.","PeriodicalId":289794,"journal":{"name":"Web-KR '14","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122979934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Structured Information Extraction from Natural Disaster Events on Twitter 推特上自然灾害事件的结构化信息提取

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663794

Sandeep Panem, Manish Gupta, Vasudeva Varma

{"title":"Structured Information Extraction from Natural Disaster Events on Twitter","authors":"Sandeep Panem, Manish Gupta, Vasudeva Varma","doi":"10.1145/2663792.2663794","DOIUrl":"https://doi.org/10.1145/2663792.2663794","url":null,"abstract":"As soon as natural disaster events happen, users are eager to know more about them. However, search engines currently provide a ten blue links interface for queries related to such events. Relevance of results for such queries can be significantly improved if users are shown a structured summary of the fresh events related to such queries. This would not just reduce the number of user clicks to get the relevant information but would also help users get updated with more fine grained attribute-level information. Twitter is a great source that can be exploited for obtaining such fine-grained structured information for fresh natural disaster events. Such events are often reported on Twitter much earlier than on other news media. However, extracting such structured information from tweets is challenging because: 1. tweets are noisy and ambiguous; 2. there is no well defined schema for various types of natural disaster events; 3. it is not trivial to extract attribute-value pairs and facts from unstructured text; and 4. it is difficult to find good mappings between extracted attributes and attributes in the event schema.\u0000 We propose algorithms to extract attribute-value pairs, and also devise novel mechanisms to map such pairs to manually generated schemas for natural disaster events. Besides the tweet text, we also leverage text from URL links in the tweets to fill such schemas. Our schemas are temporal in nature and the values are updated whenever fresh information flows in from human sensors on Twitter. Evaluation on ~58000 tweets for 20 events shows that our system can fill such event schemas with an F1 of ~0.6.","PeriodicalId":289794,"journal":{"name":"Web-KR '14","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124051188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Structure Learning of Bayesian Network with Latent Variables by Weight-Induced Refinement 基于权致细化的隐变量贝叶斯网络结构学习

Web-KR '14 Pub Date : 2014-11-03 DOI: 10.1145/2663792.2663798

Chao He, Kun Yue, Hao Wu, Weiyi Liu

引用次数: 8