Proceedings of the Tenth ACM International Conference on Web Search and Data Mining最新文献_第8页

Delving Deep into Personal Photo and Video Search 深入研究个人照片和视频搜索

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018736

Lu Jiang, Yannis Kalantidis, Liangliang Cao, S. Farfade, Jiliang Tang, Alexander Hauptmann

{"title":"Delving Deep into Personal Photo and Video Search","authors":"Lu Jiang, Yannis Kalantidis, Liangliang Cao, S. Farfade, Jiliang Tang, Alexander Hauptmann","doi":"10.1145/3018661.3018736","DOIUrl":"https://doi.org/10.1145/3018661.3018736","url":null,"abstract":"The ubiquity of mobile devices and cloud services has led to an unprecedented growth of online personal photo and video collections. Due to the scarcity of personal media search log data, research to date has mainly focused on searching images and videos on the web. However, in order to manage the exploding amount of personal photos and videos, we raise a fundamental question: what are the differences and similarities when users search their own photos versus the photos on the web? To the best of our knowledge, this paper is the first to study personal media search using large-scale real-world search logs. We analyze different types of search sessions mined from Flickr search logs and discover a number of interesting characteristics of personal media search in terms of information needs and click behaviors. The insightful observations will not only be instrumental in guiding future personal media search methods, but also benefit related tasks such as personal photo browsing and recommendation. Our findings suggest there is a significant gap between personal queries and automatically detected concepts, which is responsible for the low accuracy of many personal media search queries. To bridge the gap, we propose the deep query understanding model to learn a mapping from the personal queries to the concepts in the clicked photos. Experimental results verify the efficacy of the proposed method in improving personal media search, where the proposed method consistently outperforms baseline methods.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127311567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation 一次文档和一次分数查询评估的比较

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018726

Matt Crane, J. Culpepper, Jimmy J. Lin, J. Mackenzie, A. Trotman, D. Cheriton

{"title":"A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation","authors":"Matt Crane, J. Culpepper, Jimmy J. Lin, J. Mackenzie, A. Trotman, D. Cheriton","doi":"10.1145/3018661.3018726","DOIUrl":"https://doi.org/10.1145/3018661.3018726","url":null,"abstract":"We present an empirical comparison between document-at-a-time (DaaT) and score-at-a-time (SaaT) document ranking strategies within a common framework. Although both strategies have been extensively explored, the literature lacks a fair, direct comparison: such a study has been difficult due to vastly different query evaluation mechanics and index organizations. Our work controls for score quantization, document processing, compression, implementation language, implementation effort, and a number of details, arriving at an empirical evaluation that fairly characterizes the performance of three specific techniques: WAND (DaaT), BMW (DaaT), and JASS (SaaT). Experiments reveal a number of interesting findings. The performance gap between WAND and BMW is not as clear as the literature suggests, and both methods are susceptible to tail queries that may take orders of magnitude longer than the median query to execute. Surprisingly, approximate query evaluation in WAND and BMW does not significantly reduce the risk of these tail queries. Overall, JASS is slightly slower than either WAND or BMW, but exhibits much lower variance in query latencies and is much less susceptible to tail query effects. Furthermore, JASS query latency is not particularly sensitive to the retrieval depth, making it an appealing solution for performance-sensitive applications where bounds on query latencies are desirable.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126054835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Primum Non Nocere: Healthcare In The Digital Age 最重要的是:数字时代的医疗保健

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3022765

Anjali Joshi

{"title":"Primum Non Nocere: Healthcare In The Digital Age","authors":"Anjali Joshi","doi":"10.1145/3018661.3022765","DOIUrl":"https://doi.org/10.1145/3018661.3022765","url":null,"abstract":"Internet search has become the first stop in many users' health journeys. Today, about 1 in 20 Google searches are related to healthcare. These queries span a broad range of information needs as people are looking for possible conditions related to their symptoms and are seeking to understand their diagnoses and prescribed treatments, decipher their test results, find pathways of self-care as well as connect to people with similar experiences. This talk will cover the approaches that we at Google have used to meet these diverse user needs. We will also discuss how we constructed and curated the Health Knowledge Graph, a large scale resource of highly accurate medical knowledge that powers many of our health applications. In the second part of the talk, we will show how the confluence of advances in technology enables us to revolutionize health data collection and perform it at unprecedented scale and granularity. Combined with contextual signals, anonymous aggregated user activity can be used to quantify public health phenomena and provide concerned authorities with actionable information about seasonal or situational health issues. We will conclude the talk with an outline of research directions that could enable people and organizations in personal and public health settings obtain actionable information in a timely manner. The work presented here was the product of collaboration of multiple teams at Google.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127892471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Partitioning and Segment Organization Strategies for Real-Time Selective Search on Document Streams 文档流实时选择搜索的分区和段组织策略

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018727

Yulu Wang, Jimmy J. Lin

{"title":"Partitioning and Segment Organization Strategies for Real-Time Selective Search on Document Streams","authors":"Yulu Wang, Jimmy J. Lin","doi":"10.1145/3018661.3018727","DOIUrl":"https://doi.org/10.1145/3018661.3018727","url":null,"abstract":"The basic idea behind selective search is to partition a collection into topical clusters, and for each query, consider only a subset of the clusters that are likely to contain relevant documents. Previous work on web collections has shown that it is possible to retain high-quality results while considering only a small fraction of the collection. These studies, however, assume static collections where it is feasible to run batch clustering algorithms for partitioning. In this work, we consider the novel formulation of selective search on document streams (specifically, tweets), where partitioning must be performed incrementally. In our approach, documents are partitioned into temporal segments and selective search is performed within each segment: these segments can either be clustered using batch or online algorithms, and at different temporal granularities. For efficiency, we take advantage of word embeddings to reduce the dimensionality of the document vectors. Experiments with test collections from the TREC Microblog Tracks show that we are able to achieve precision indistinguishable from exhaustive search while considering only around 5% of the collection. Interestingly, we observe no significant effectiveness differences between batch vs. online clustering and between hourly vs. daily temporal segments, despite them being very different index organizations. This suggests that architectural choices should be primarily guided by efficiency considerations.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"93 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127981646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Evolution of Ego-networks in Social Media with Link Recommendations 基于链接推荐的社交媒体自我网络演化

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018733

L. Aiello, Nicola Barbieri

引用次数: 25

Counting Graphlets: Space vs Time 计算graphlet:空间vs时间

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018732

M. Bressan, Flavio Chierichetti, Ravi Kumar, S. Leucci, A. Panconesi

{"title":"Counting Graphlets: Space vs Time","authors":"M. Bressan, Flavio Chierichetti, Ravi Kumar, S. Leucci, A. Panconesi","doi":"10.1145/3018661.3018732","DOIUrl":"https://doi.org/10.1145/3018661.3018732","url":null,"abstract":"Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural approaches based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that this approach is outperformed by a carefully engineered version of color coding (CC) [1], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC. Furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC's memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that a careful implementation of CC can push the limits of the state of the art, both in terms of the size of the input graph and of that of the graphlets.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133952142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75

Mining Actionable Insights from Social Networksat WSDM 2017 从社交网络中挖掘可操作的见解

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3022759

F. Ensan, Z. Noorian, E. Bagheri

引用次数: 1

Leveraging Behavioral Factorization and Prior Knowledge for Community Discovery and Profiling 利用行为分解和先验知识进行社区发现和分析

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018693

Mohammad Akbari, Tat-Seng Chua

{"title":"Leveraging Behavioral Factorization and Prior Knowledge for Community Discovery and Profiling","authors":"Mohammad Akbari, Tat-Seng Chua","doi":"10.1145/3018661.3018693","DOIUrl":"https://doi.org/10.1145/3018661.3018693","url":null,"abstract":"Recently community detection has attracted much interest in social media to understand the collective behaviours of users and allow individuals to be modeled in the context of the group. Most existing approaches for community detection exploit either users' social links or their published content, aiming at discovering groups of densely connected or highly similar users. They often fail to find effective communities due to excessive noise in content, sparsity in links, and heterogenous behaviours of users in social media. Further, they are unable to provide insights and rationales behind the formation of the group and the collective behaviours of the users. To tackle these challenges, we propose to discover communities in a low- dimensional latent space in which we simultaneously learn the representation of users and communities. In particular, we integrated different social views of the network into a low-dimensional latent space in which we sought dense clusters of users as communities. By imposing a Laplacian regularizer into affiliation matrix, we further incorporated prior knowledge into the community discovery process. Finally community profiles were computed by a linear operator integrating the profiles of members. Taking the wellness domain as an example, we conducted experiments on a large scale real world dataset of users tweeting about diabetes and its related concepts, which demonstrate the effectiveness of our approach in discovering and profiling user communities.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131064298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Label Informed Attributed Network Embedding 标签通知属性网络嵌入

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3018667

Xiao Huang, Jundong Li, Xia Hu

{"title":"Label Informed Attributed Network Embedding","authors":"Xiao Huang, Jundong Li, Xia Hu","doi":"10.1145/3018661.3018667","DOIUrl":"https://doi.org/10.1145/3018661.3018667","url":null,"abstract":"Attributed network embedding aims to seek low-dimensional vector representations for nodes in a network, such that original network topological structure and node attribute proximity can be preserved in the vectors. These learned representations have been demonstrated to be helpful in many learning tasks such as network clustering and link prediction. While existing algorithms follow an unsupervised manner, nodes in many real-world attributed networks are often associated with abundant label information, which is potentially valuable in seeking more effective joint vector representations. In this paper, we investigate how labels can be modeled and incorporated to improve attributed network embedding. This is a challenging task since label information could be noisy and incomplete. In addition, labels are completely distinct with the geometrical structure and node attributes. The bewildering combination of heterogeneous information makes the joint vector representation learning more difficult. To address these issues, we propose a novel Label informed Attributed Network Embedding (LANE) framework. It can smoothly incorporate label information into the attributed network embedding while preserving their correlations. Experiments on real-world datasets demonstrate that the proposed framework achieves significantly better performance compared with the state-of-the-art embedding algorithms.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114424045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 427

Ten Years of Wisdom 十年智慧

Proceedings of the Tenth ACM International Conference on Web Search and Data Mining Pub Date : 2017-02-02 DOI: 10.1145/3018661.3022744

R. Baeza-Yates

引用次数: 1