Cheng Luo, Yiqun Liu, T. Sakai, K. Zhou, Fan Zhang, Xue Li, Shaoping Ma
{"title":"Does Document Relevance Affect the Searcher's Perception of Time?","authors":"Cheng Luo, Yiqun Liu, T. Sakai, K. Zhou, Fan Zhang, Xue Li, Shaoping Ma","doi":"10.1145/3018661.3018694","DOIUrl":"https://doi.org/10.1145/3018661.3018694","url":null,"abstract":"Time plays an essential role in multiple areas of Information Retrieval (IR) studies such as search evaluation, user behavior analysis, temporal search result ranking and query understanding. Especially, in search evaluation studies, time is usually adopted as a measure to quantify users' efforts in search processes. Psychological studies have reported that the time perception of human beings can be affected by many stimuli, such as attention and motivation, which are closely related to many cognitive factors in search. Considering the fact that users' search experiences are affected by their subjective feelings of time, rather than the objective time measured by timing devices, it is necessary to look into the different factors that have impacts on search users' perception of time. In this work, we make a first step towards revealing the time perception mechanism of search users with the following contributions: (1) We establish an experimental research framework to measure the subjective perception of time while reading documents in search scenario, which originates from but is also different from traditional time perception measurements in psychological studies. (2) With the framework, we show that while users are reading result documents, document relevance has small yet visible effect on search users' perception of time. By further examining the impact of other factors, we demonstrate that the effect on relevant documents can also be influenced by individuals and tasks. (3) We conduct a preliminary experiment in which the difference between perceived time and dwell time is taken into consideration in a search evaluation task. We found that the revised framework achieved a better correlation with users' satisfaction feedbacks. This work may help us better understand the time perception mechanism of search users and provide insights in how to better incorporate time factor in search evaluation studies.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125068364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Source Code to Support Retrieval-Based Applications","authors":"Venkatesh Vinayakarao","doi":"10.1145/3018661.3022749","DOIUrl":"https://doi.org/10.1145/3018661.3022749","url":null,"abstract":"Advances in text retrieval do not apply directly to source code retrieval because of the difference in characteristics of source code when compared to text. Recently, researchers have proposed specialized indexing and querying techniques that are based on structural representations and semantics of source code. Current tools and techniques heavily depend on user defined terms in source code to leverage text retrieval techniques. Further, they have limitations in handling partial programs, platform independence and index-time processing. We focus on building reusable models of source code for the purposes of indexing and querying.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121246161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaliang Li, Nan Du, Chaochun Liu, Yu-Zhe Xie, Wei Fan, Qi Li, Jing Gao, Huan Sun
{"title":"Reliable Medical Diagnosis from Crowdsourcing: Discover Trustworthy Answers from Non-Experts","authors":"Yaliang Li, Nan Du, Chaochun Liu, Yu-Zhe Xie, Wei Fan, Qi Li, Jing Gao, Huan Sun","doi":"10.1145/3018661.3018688","DOIUrl":"https://doi.org/10.1145/3018661.3018688","url":null,"abstract":"Nowadays, increasingly more people are receiving medical diagnoses from healthcare-related question answering platforms as people can get diagnoses quickly and conveniently. However, such diagnoses from non-expert crowdsourcing users are noisy or even wrong due to the lack of medical domain knowledge, which can cause serious consequences. To unleash the power of crowdsourcing on healthcare question answering, it is important to identify trustworthy answers and filter out noisy ones from user-generated data. Truth discovery methods estimate user reliability degrees and infer trustworthy information simultaneously, and thus these methods can be adopted to discover trustworthy diagnoses from crowdsourced answers. However, existing truth discovery methods do not take into account the rich semantic meanings of the answers. In the light of this challenge, we propose a method to automatically capture the semantic meanings of answers, where answers are represented as real-valued vectors in the semantic space. To learn such vector representations from noisy user-generated data, we tightly combine the truth discovery and vector learning processes. In this way, the learned vector representations enable truth discovery method to model the semantic relations among answers, and the information trustworthiness inferred by truth discovery can help the procedure of vector representation learning. To demonstrate the effectiveness of the proposed method, we collect a large-scale real-world dataset that involves 219,527 medical diagnosis questions and 23,657 non-expert users. Experimental results show that the proposed method improves the accuracy of identified trustworthy answers due to the successful consideration of answers' semantic meanings. Further, we demonstrate the fast convergence and good scalability of the proposed method, which makes it practical for real-world applications.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116354849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chin-Chi Hsu, Yi-An Lai, Wen-Hao Chen, Ming-Han Feng, Shou-de Lin
{"title":"Unsupervised Ranking using Graph Structures and Node Attributes","authors":"Chin-Chi Hsu, Yi-An Lai, Wen-Hao Chen, Ming-Han Feng, Shou-de Lin","doi":"10.1145/3018661.3018668","DOIUrl":"https://doi.org/10.1145/3018661.3018668","url":null,"abstract":"PageRank has been the signature unsupervised ranking model for ranking node importance in a graph. One potential drawback of PageRank is that its computation depends only on input graph structures, not considering external information such as the attributes of nodes. This work proposes AttriRank, an unsupervised ranking model that considers not only graph structure but also the attributes of nodes. AttriRank is unsupervised and domain-independent, which is different from most of the existing works requiring either ground-truth labels or specific domain knowledge. Combining two reasonable assumptions about PageRank and node attributes, AttriRank transfers extra node information into a Markov chain model to obtain the ranking. We further develop approximation for AttriRank and reduce its complexity to be linear to the number of nodes or links in the graph, which makes it feasible for large network data. The experiments show that AttriRank outperforms competing models in diverse graph ranking applications.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124411396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Patton, T. Potok, Petr Knoth, Drahomira Herrmannova
{"title":"Workshop on Scholarly Web Mining (SWM 2017)","authors":"R. Patton, T. Potok, Petr Knoth, Drahomira Herrmannova","doi":"10.1145/3018661.3022758","DOIUrl":"https://doi.org/10.1145/3018661.3022758","url":null,"abstract":"Researchers increasingly report their results through online publications, from research papers, data and software to experiments, observations and ideas. Immense amount of research-related data is available on the web on interlinked pages, in repositories, databases, social networking sites, etc. Consequently, researchers rely on online sources, often through search engines, to perform literature searches for their research âĂŤ to search for papers, topics, people etc. to be able to produce new research. However, these publications can be used not only for traditional literature searches, but also as a source for discovering popular and emerging research topics, key publications and people or evaluating research excellence. To aid research, it is important to leverage the potential of data mining technologies to improve the process of how research is being done. This workshop aims to bring together people from different backgrounds who are interested in analysing and mining scholarly data available via web and social media sources using various approaches such as query log mining, graph analysis, text mining, etc., and/or who develop systems that enable such analysis and mining. The topics of this workshop include, but are not limited to, the following areas:","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125702441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shay Ben-Elazar, G. Lavee, Noam Koenigstein, Oren Barkan, Hilik Berezin, U. Paquet, Tal Zaccai
{"title":"Groove Radio: A Bayesian Hierarchical Model for Personalized Playlist Generation","authors":"Shay Ben-Elazar, G. Lavee, Noam Koenigstein, Oren Barkan, Hilik Berezin, U. Paquet, Tal Zaccai","doi":"10.1145/3018661.3018718","DOIUrl":"https://doi.org/10.1145/3018661.3018718","url":null,"abstract":"This paper describes an algorithm designed for Microsoft's Groove music service, which serves millions of users world wide. We consider the problem of automatically generating personalized music playlists based on queries containing a ``seed'' artist and the listener's user ID. Playlist generation may be informed by a number of information sources including: user specific listening patterns, domain knowledge encoded in a taxonomy, acoustic features of audio tracks, and overall popularity of tracks and artists. The importance assigned to each of these information sources may vary depending on the specific combination of user and seed~artist. The paper presents a method based on a variational Bayes solution for learning the parameters of a model containing a four-level hierarchy of global preferences, genres, sub-genres and artists. The proposed model further incorporates a personalization component for user-specific preferences. Empirical evaluations on both proprietary and public datasets demonstrate the effectiveness of the algorithm and showcase the contribution of each of its components.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127025976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Ren, Shangsong Liang, Piji Li, Shuaiqiang Wang, M. de Rijke
{"title":"Social Collaborative Viewpoint Regression with Explainable Recommendations","authors":"Z. Ren, Shangsong Liang, Piji Li, Shuaiqiang Wang, M. de Rijke","doi":"10.1145/3018661.3018686","DOIUrl":"https://doi.org/10.1145/3018661.3018686","url":null,"abstract":"A recommendation is called explainable if it not only predicts a numerical rating for an item, but also generates explanations for users' preferences. Most existing methods for explainable recommendation apply topic models to analyze user reviews to provide descriptions along with the recommendations they produce. So far, such methods have neglected user opinions and influences from social relations as a source of information for recommendations, even though these are known to improve the rating prediction. In this paper, we propose a latent variable model, called social collaborative viewpoint regression (sCVR), for predicting item ratings based on user opinions and social relations. To this end, we use so-called viewpoints, represented as tuples of a concept, topic, and a sentiment label from both user reviews and trusted social relations. In addition, such viewpoints can be used as explanations. We apply a Gibbs EM sampler to infer posterior distributions of sCVR. Experiments conducted on three large benchmark datasets show the effectiveness of our proposed method for predicting item ratings and for generating explanations.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128079533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keeping Apace with Progress in Natural Language Processing","authors":"Claire Cardie","doi":"10.1145/3018661.3022745","DOIUrl":"https://doi.org/10.1145/3018661.3022745","url":null,"abstract":"Increasingly central to online search and information discovery are methods from Natural Language Processing (NLP). Research in the field, however, is progressing at what feels a frenetic pace, making it a daunting task to keep up with the latest results. This talk will describe some of the tasks and sub-areas of NLP that have experienced fundamental advances in the state of the art over past few years, focusing on what these advances mean for understanding and extracting information from online text.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131481137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enterprise Employee Training via Project Team Formation","authors":"Jiawei Zhang, Philip S. Yu, Yuanhua Lv","doi":"10.1145/3018661.3018682","DOIUrl":"https://doi.org/10.1145/3018661.3018682","url":null,"abstract":"Professional career training for novice employees at elementary levels to help them master necessary working skills is critical for both achieving employees' professional success and enhancing the enterprise growth. Besides adopting professional services from external career training agencies, companies can actually train the employees more effectively by involving them in various internal projects carried out in the companies. In this paper, we will study the \"Employee Training\" (ET) problem by assigning the employees to various concrete company internal projects. From the company perspective, besides training the employees, another important objective of carrying out these projects is to finish them successfully. The successful accomplishment of projects depends on various issues, like the skill qualification of the built teams and the effective collaboration among the team members. To achieve these two objectives simultaneously, a novel framework named \"Team foRmAtion based employee traINing\" (TRAIN) is proposed in this paper. TRAIN formulates the ET problem as a joint optimization problem, where the objective function considers the employees' overall skill gain and the team internal communication costs at the same time. To ensure the success of the projects, a new team skill qualification constraint is proposed and added to the optimization problem. Extensive experiments conducted on the real-world enterprise employee project team dataset demonstrate the effectiveness of TRAIN in addressing the problem.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"1 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120932088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Tsikrika, Babak Akhgar, Vasilis Katos, S. Vrochidis, P. Burnap, M. Williams
{"title":"1st International Workshop on Search and Mining Terrorist Online Content & Advances in Data Science for Cyber Security and Risk on the Web","authors":"T. Tsikrika, Babak Akhgar, Vasilis Katos, S. Vrochidis, P. Burnap, M. Williams","doi":"10.1145/3018661.3022760","DOIUrl":"https://doi.org/10.1145/3018661.3022760","url":null,"abstract":"The deliberate misuse of technical infrastructure (including the Web and social media) for cyber deviant and cybercriminal behaviour, ranging from the spreading of extremist and terrorism-related material to online fraud and cyber security attacks, is on the rise. This workshop aims to better understand such phenomena and develop methods for tackling them in an effective and efficient manner. The workshop brings together interdisciplinary researchers and experts in Web search, security informatics, social media analysis, machine learning, and digital forensics, with particular interests in cyber security. The workshop programme includes refereed papers, invited talks and a panel discussion for better understanding the current landscape, as well as the future of data mining for detecting cyber deviance.","PeriodicalId":344017,"journal":{"name":"Proceedings of the Tenth ACM International Conference on Web Search and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116617881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}