Proceedings of the Ninth ACM International Conference on Web Search and Data Mining最新文献

筛选
英文 中文
Session details: Big Data Algorithms 会话详情:大数据算法
R. Lempel
{"title":"Session details: Big Data Algorithms","authors":"R. Lempel","doi":"10.1145/3253877","DOIUrl":"https://doi.org/10.1145/3253877","url":null,"abstract":"","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76888284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion EgoSet:利用词自我网络和用户生成本体进行多面集扩展
Xin Rong, Zhe Chen, Q. Mei, Eytan Adar
{"title":"EgoSet: Exploiting Word Ego-networks and User-generated Ontology for Multifaceted Set Expansion","authors":"Xin Rong, Zhe Chen, Q. Mei, Eytan Adar","doi":"10.1145/2835776.2835808","DOIUrl":"https://doi.org/10.1145/2835776.2835808","url":null,"abstract":"A key challenge of entity set expansion is that multifaceted input seeds can lead to significant incoherence in the result set. In this paper, we present a novel solution to handling multifaceted seeds by combining existing user-generated ontologies with a novel word-similarity metric based on skip-grams. By blending the two resources we are able to produce sparse word ego-networks that are centered on the seed terms and are able to capture semantic equivalence among words. We demonstrate that the resulting networks possess internally-coherent clusters, which can be exploited to provide non-overlapping expansions, in order to reflect different semantic classes of the seeds. Empirical evaluation against state-of-the-art baselines shows that our solution, EgoSet, is able to not only capture multiple facets in the input query, but also generate expansions for each facet with higher precision.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75793893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Understanding User Attention and Engagement in Online News Reading 了解在线新闻阅读中的用户注意力和参与度
Dmitry Lagun, M. Lalmas
{"title":"Understanding User Attention and Engagement in Online News Reading","authors":"Dmitry Lagun, M. Lalmas","doi":"10.1145/2835776.2835833","DOIUrl":"https://doi.org/10.1145/2835776.2835833","url":null,"abstract":"Prior work on user engagement with online media identified web page dwell time as a key metric reflecting level of user engagement with online news articles. While on average, dwell time gives a reasonable estimate of user experience with a news article, it is not able to capture important aspects of user interaction with the page, such as how much time a user spends reading the article vs. viewing the comment posted by other users, or the actual proportion of article read by the user. In this paper, we propose a set of user engagement classes along with new user engagement metrics that, unlike dwell time, more accurately reflect user experience with the content. Our user engagement classes provide clear and interpretable taxonomy of user engagement with online news, and are defined based on amount of time user spends on the page, proportion of the article user actually reads and the amount of interaction users performs with the comments. Moreover, we demonstrate that our metrics are relatively easier to predict from the news article content, compared to the dwell time, making optimization of user engagement more attainable goal.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73166494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 97
Discriminative Learning of Infection Models 感染模型的判别学习
Nir Rosenfeld, M. Nitzan, A. Globerson
{"title":"Discriminative Learning of Infection Models","authors":"Nir Rosenfeld, M. Nitzan, A. Globerson","doi":"10.1145/2835776.2835802","DOIUrl":"https://doi.org/10.1145/2835776.2835802","url":null,"abstract":"Infection and diffusion processes over networks arise in many domains. These introduce many challenging prediction tasks, such as influence estimation, trend prediction, and epidemic source localization. The standard approach to such problems is generative: assume an underlying infection model, learn its parameters, and infer the required output. In order to learn efficiently, the chosen infection models are often simple, and learning is focused on inferring the parameters of the model rather than on optimizing prediction accuracy. Here we argue that for prediction tasks, a discriminative approach is more adequate. We introduce DIMPLE, a novel discriminative learning framework for training classifiers based on dynamic infection models. We show how highly non-linear predictors based on infection models can be \"linearized\" by considering a larger class of prediction functions. Efficient learning over this class is performed by constructing \"infection kernels\" based on the outputs of infection models, and can be plugged into any kernel-supporting framework. DIMPLE can be applied to virtually any infection-related prediction task and any infection model for which the desired output can be calculated or simulated. For influence estimation in well-known infection models, we show that the kernel can either be computed in closed form, or reduces to estimating co-influence of seed pairs. We apply DIMPLE to the tasks of influence estimation on synthetic and real data from Digg, and to predicting customer network value in Polly, a viral phone-based development-related service deployed in low-literate communities. Our results show that DIMPLE outperforms strong baselines.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"56 4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74198838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Term-by-Term Query Auto-Completion for Mobile Search 移动搜索的逐项查询自动完成
S. Vargas, Roi Blanco, P. Mika
{"title":"Term-by-Term Query Auto-Completion for Mobile Search","authors":"S. Vargas, Roi Blanco, P. Mika","doi":"10.1145/2835776.2835813","DOIUrl":"https://doi.org/10.1145/2835776.2835813","url":null,"abstract":"With the ever increasing usage of mobile search, where text input is typically slow and error-prone, assisting users to formulate their queries contributes to a more satisfactory search experience. Query auto-completion (QAC) techniques, which predict possible completions for user queries, are the archetypal example of query assistance and are present in most search engines. We argue, however, that classic QAC, which operates by suggesting whole-query completions, may be sub-optimal for the case of mobile search as the available screen real estate to show suggestions is limited and editing is typically slower than in desktop search. In this paper we propose the idea of term-by-term QAC, which is a new technique inspired by predictive keyboards that suggests to the user one term at a time, instead of whole-query completions. We describe an efficient mechanism to implement this technique and an adaptation of a prior user model to evaluate the effectiveness of both standard and term-by-term QAC approaches using query log data. Our experiments with a mobile query log from a commercial search engine show the validity of our approach according to this user model with respect to saved characters, saved terms and examination effort. Finally, a user study provides further insights about our term-by-term technique compared with standard QAC with respect to the variables analyzed in the query log-based evaluation and additional variables related to the successfulness, the speed of the interactions and the properties of the submitted queries.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74195776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis 利用新的基于情绪的元级特征进行有效的情绪分析
Sérgio D. Canuto, Marcos André Gonçalves, Fabrício Benevenuto
{"title":"Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis","authors":"Sérgio D. Canuto, Marcos André Gonçalves, Fabrício Benevenuto","doi":"10.1145/2835776.2835821","DOIUrl":"https://doi.org/10.1145/2835776.2835821","url":null,"abstract":"In this paper we address the problem of automatically learning to classify the sentiment of short messages/reviews by exploiting information derived from meta-level features i.e., features derived primarily from the original bag-of-words representation. We propose new meta-level features especially designed for the sentiment analysis of short messages such as: (i) information derived from the sentiment distribution among the k nearest neighbors of a given short test document x, (ii) the distribution of distances of x to their neighbors and (iii) the document polarity of these neighbors given by unsupervised lexical-based methods. Our approach is also capable of exploiting information from the neighborhood of document x regarding (highly noisy) data obtained from 1.6 million Twitter messages with emoticons. The set of proposed features is capable of transforming the original feature space into a new one, potentially smaller and more informed. Experiments performed with a substantial number of datasets (nineteen) demonstrate that the effectiveness of the proposed sentiment-based meta-level features is not only superior to the traditional bag-of-word representation (by up to 16%) but is also superior in most cases to state-of-art meta-level features previously proposed in the literature for text classification tasks that do not take into account some idiosyncrasies of sentiment analysis. Our proposal is also largely superior to the best lexicon-based methods as well as to supervised combinations of them. In fact, the proposed approach is the only one to produce the best results in all tested datasets in all scenarios.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73994554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia 跨模态一致性回归在社交多媒体视觉-文本情感分析中的应用
Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang
{"title":"Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia","authors":"Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang","doi":"10.1145/2835776.2835779","DOIUrl":"https://doi.org/10.1145/2835776.2835779","url":null,"abstract":"Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment analysis to develop systems to predict political elections, measure economic indicators, and so on. Recently, social media users are increasingly using additional images and videos to express their opinions and share their experiences. Sentiment analysis of such large-scale textual and visual content can help better extract user sentiments toward events or topics. Motivated by the needs to leverage large-scale social multimedia content for sentiment analysis, we propose a cross-modality consistent regression (CCR) model, which is able to utilize both the state-of-the-art visual and textual sentiment analysis techniques. We first fine-tune a convolutional neural network (CNN) for image sentiment analysis and train a paragraph vector model for textual sentiment analysis. On top of them, we train our multi-modality regression model. We use sentimental queries to obtain half a million training samples from Getty Images. We have conducted extensive experiments on both machine weakly labeled and manually labeled image tweets. The results show that the proposed model can achieve better performance than the state-of-the-art textual and visual sentiment analysis algorithms alone.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80602564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 155
Quality Management in Crowdsourcing using Gold Judges Behavior 基于金牌评委行为的众包质量管理
G. Kazai, I. Zitouni
{"title":"Quality Management in Crowdsourcing using Gold Judges Behavior","authors":"G. Kazai, I. Zitouni","doi":"10.1145/2835776.2835835","DOIUrl":"https://doi.org/10.1145/2835776.2835835","url":null,"abstract":"Crowdsourcing relevance labels has become an accepted practice for the evaluation of IR systems, where the task of constructing a test collection is distributed over large populations of unknown users with widely varied skills and motivations. Typical methods to check and ensure the quality of the crowd's output is to inject work tasks with known answers (gold tasks) on which workers' performance can be measured. However, gold tasks are expensive to create and have limited application. A more recent trend is to monitor the workers' interactions during a task and estimate their work quality based on their behavior. In this paper, we show that without gold behavior signals that reflect trusted interaction patterns, classifiers can perform poorly, especially for complex tasks, which can lead to high quality crowd workers getting blocked while poorly performing workers remain undetected. Through a series of crowdsourcing experiments, we compare the behaviors of trained professional judges and crowd workers and then use the trained judges' behavior signals as gold behavior to train a classifier to detect poorly performing crowd workers. Our experiments show that classification accuracy almost doubles in some tasks with the use of gold behavior data.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82903758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Relationship Queries on Extended Knowledge Graphs 扩展知识图上的关系查询
Mohamed Yahya, Denilson Barbosa, K. Berberich, Qiuyue Wang, G. Weikum
{"title":"Relationship Queries on Extended Knowledge Graphs","authors":"Mohamed Yahya, Denilson Barbosa, K. Berberich, Qiuyue Wang, G. Weikum","doi":"10.1145/2835776.2835795","DOIUrl":"https://doi.org/10.1145/2835776.2835795","url":null,"abstract":"Entity search over text corpora is not geared for relationship queries where answers are tuples of related entities and where a query often requires joining cues from multiple documents. With large knowledge graphs, structured querying on their relational facts is an alternative, but often suffers from poor recall because of mismatches between user queries and the knowledge graph or because of weakly populated relations. This paper presents the TriniT search engine for querying and ranking on extended knowledge graphs that combine relational facts with textual web contents. Our query language is designed on the paradigm of SPO triple patterns, but is more expressive, supporting textual phrases for each of the SPO arguments. We present a model for automatic query relaxation to compensate for mismatches between the data and a user's query. Query answers -- tuples of entities -- are ranked by a statistical language model. We present experiments with different benchmarks, including complex relationship queries, over a combination of the Yago knowledge graph and the entity-annotated ClueWeb'09 corpus.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89289681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Understanding Diffusion Processes: Inference and Theory 理解扩散过程:推理和理论
Xinran He
{"title":"Understanding Diffusion Processes: Inference and Theory","authors":"Xinran He","doi":"10.1145/2835776.2855084","DOIUrl":"https://doi.org/10.1145/2835776.2855084","url":null,"abstract":"With increasing popularity of social media and social networks sites, analyzing the social networks offers great potential to shed light on human social structure and provides great marketing opportunities. Usually, social network analysis starts with extracting or learning the social network and the associated parameters. Contrary to other analytical tasks, this step is highly non-trivial due to amorphous nature of social ties and the challenges of noisy and incomplete observations. My research focuses on improving accuracy in inferring the network as well as analyzing the consequences when the extracted network is noisy or erroneous. To be more precise, I propose to study the following two questions with a special focus on analyzing diffusion behaviors: (1) How to utilize special properties of social networks to improve accuracy of the extracted network under noisy and missing data; (2) How to characterize the impact of noise in the inferred network and carry out robust analysis and optimization. Usually the first step towards social influence analysis is to infer the diffusion network. Assuming a probabilistic model of influence and a model of how the timing of individuals’ adoption decisions correlates, one can use these data to estimate the strengths of influence between pairs of individuals. However, existing approaches for Network Inference rely on the common assumption that the observations used to train the models are complete, while missing observations are commonplace in practice due to time or technical limitations in data collection. Therefore, I propose to study the impact of incomplete observations and design efficient method to compensate for noise or incompleteness in observed data. I propose to exploit the fact that social networks have more specific structure than arbitrary graphs. A joint estimation of the graph generation model and the actual network structure is likely to significantly improve the estimation accuracy. Moreover, incorporating the content information of the cascade also has potential to improve the inference accuracy. Therefore, I propose to combine the Correlated Topic Model [1] and Hawkes Process [5, 4, 6] into a unified model to utilize content information [2]. Due to noise or missing data in the observations, even in the best case, one would expect that the inferred network structure and link strengths will only be an approximation to the truth; in other words, noise in the data will be pervasive for inferred social networks. I propose to focus on the algorithmic question of Influence Maximization [3] in the context of noisy social network data. More specifically, I propose to consider the following questions: Given an instance of an Influence Model, with level of mis-estimation: (1) Decide whether the objective function on this instance varies smoothly with perturbations to the parameters. (2) If the dependence is smooth, how to find a robustly nearoptimal solution.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"02 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88843001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信