{"title":"Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach","authors":"Chenwei Zhang, Wei Fan, Nan Du, Philip S. Yu","doi":"10.1145/2872427.2874810","DOIUrl":"https://doi.org/10.1145/2872427.2874810","url":null,"abstract":"Text queries are naturally encoded with user intentions. An intention detection task tries to model and discover intentions that user encoded in text queries. Unlike conventional text classification tasks where the label of text is highly correlated with some topic-specific words, words from different topic categories tend to co-occur in medical related queries. Besides the existence of topic-specific words and word order, word correlations and the way words organized into sentence are crucial to intention detection tasks. In this paper, we present a neural network based jointly modeling approach to model and capture user intentions in medical related text queries. Regardless of the exact words in text queries, the proposed method incorporates two types of heterogeneous information: 1) pairwise word feature correlations and 2) part-of-speech tags of a sentence to jointly model user intentions. Variable-length text queries are first inherently taken care of by a fixed-size pairwise feature correlation matrix. Moreover, convolution and pooling operations are applied on feature correlations to fully exploit latent semantic structure within the query. Sentence rephrasing is finally introduced as a data augmentation technique to improve model generalization ability during model training. Experiment results on real world medical queries have shown that the proposed method is able to extract complete and precise user intentions from text queries.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76436484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Green Energy to Reduce the Operational Costs of Multi-Center Web Search Engines","authors":"Roi Blanco, Matteo Catena, N. Tonellotto","doi":"10.1145/2872427.2883021","DOIUrl":"https://doi.org/10.1145/2872427.2883021","url":null,"abstract":"Carbon dioxide emissions resulting from fossil fuels (brown energy) combustion are the main cause of global warming due to the greenhouse effect. Large IT companies have recently increased their efforts in reducing the carbon dioxide footprint originated from their data center electricity consumption. On one hand, better infrastructure and modern hardware allow for a more efficient usage of electric resources. On the other hand, data-centers can be powered by renewable sources (green energy) that are both environmental friendly and economically convenient. In this paper, we tackle the problem of targeting the usage of green energy to minimize the expenditure of running multi-center Web search engines, i.e., systems composed by multiple, geographically remote, computing facilities. We propose a mathematical model to minimize the operational costs of multi-center Web search engines by exploiting renewable energies whenever available at different locations. Using this model, we design an algorithm which decides what fraction of the incoming query load arriving into one processing facility must be forwarded to be processed at different sites to use green energy sources. We experiment using real traffic from a large search engine and we compare our model against state of the art baselines for query forwarding. Our experimental results show that the proposed solution maintains an high query throughput, while reducing by up to ~25% the energy operational costs of multi-center search engines. Additionally, our algorithm can reduce the brown energy consumption by almost 6% when energy-proportional servers are employed.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73760850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongfeng Zhang, Qi Zhao, Yi Zhang, D. Friedman, Min Zhang, Yiqun Liu, Shaoping Ma
{"title":"Economic Recommendation with Surplus Maximization","authors":"Yongfeng Zhang, Qi Zhao, Yi Zhang, D. Friedman, Min Zhang, Yiqun Liu, Shaoping Ma","doi":"10.1145/2872427.2882973","DOIUrl":"https://doi.org/10.1145/2872427.2882973","url":null,"abstract":"A prime function of many major World Wide Web applications is Online Service Allocation (OSA), the function of matching individual consumers with particular services/goods (which may include loans or jobs as well as products) each with its own producer. In the applications of interest, consumers are free to choose, so OSA usually takes the form of personalized recommendation or search in practice. The performance metrics of recommender and search systems currently tend to focus on just one side of the match, in some cases the consumers (e.g. satisfaction) and in other cases the producers (e.g., profit). However, a sustainable OSA platform needs benefit both consumers and producers; otherwise the neglected party eventually may stop using it. In this paper, we show how to adapt economists' traditional idea of maximizing total surplus (the sum of consumer net benefit and producer profit) to the heterogeneous world of online service allocation, in an effort to promote the web intelligence for social good in online eco-systems. Modifications of traditional personalized recommendation algorithms enable us to apply Total Surplus Maximization (TSM) to three very different types of real-world tasks -- e-commerce, P2P lending and freelancing. The results for all three tasks suggest that TSM compares very favorably to currently popular approaches, to the benefit of both producers and consumers.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74494527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Cornolti, P. Ferragina, Massimiliano Ciaramita, Stefan Rüd, Hinrich Schütze
{"title":"A Piggyback System for Joint Entity Mention Detection and Linking in Web Queries","authors":"M. Cornolti, P. Ferragina, Massimiliano Ciaramita, Stefan Rüd, Hinrich Schütze","doi":"10.1145/2872427.2883061","DOIUrl":"https://doi.org/10.1145/2872427.2883061","url":null,"abstract":"In this paper we study the problem of linking open-domain web-search queries towards entities drawn from the full entity inventory of Wikipedia articles. We introduce SMAPH-2, a second-order approach that, by piggybacking on a web search engine, alleviates the noise and irregularities that characterize the language of queries and puts queries in a larger context in which it is easier to make sense of them. The key algorithmic idea underlying SMAPH-2 is to first discover a candidate set of entities and then link-back those entities to their mentions occurring in the input query. This allows us to confine the possible concepts pertinent to the query to only the ones really mentioned in it. The link-back is implemented via a collective disambiguation step based upon a supervised ranking model that makes one joint prediction for the annotation of the complete query optimizing directly the F1 measure. We evaluate both known features, such as word embeddings and semantic relatedness among entities, and several novel features such as an approximate distance between mentions and entities (which can handle spelling errors). We demonstrate that SMAPH-2 achieves state-of-the-art performance on the ERD@SIGIR2014 benchmark. We also publish GERDAQ (General Entity Recognition, Disambiguation and Annotation in Queries), a novel, public dataset built specifically for web-query entity linking via a crowdsourcing effort. SMAPH-2 outperforms the benchmarks by comparable margins also on GERDAQ.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83917914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fuzheng Zhang, Nicholas Jing Yuan, Kai Zheng, Defu Lian, Xing Xie, Y. Rui
{"title":"Exploiting Dining Preference for Restaurant Recommendation","authors":"Fuzheng Zhang, Nicholas Jing Yuan, Kai Zheng, Defu Lian, Xing Xie, Y. Rui","doi":"10.1145/2872427.2882995","DOIUrl":"https://doi.org/10.1145/2872427.2882995","url":null,"abstract":"The wide adoption of location-based services provide the potential to understand people's mobility pattern at an unprecedented level, which can also enable food-service industry to accurately predict consumers' dining behavior. In this paper, based on users' dining implicit feedbacks (restaurant visit via check-ins), explicit feedbacks (restaurant reviews) as well as some meta data (e.g., location, user demographics, restaurant attributes), we aim at recommending each user a list of restaurants for his next dining. Implicit and Explicit feedbacks of dining behavior exhibit different characteristics of user preference. Therefore, in our work, user's dining preference mainly contains two parts: implicit preference coming from check-in data (implicit feedbacks) and explicit preference coming from rating and review data (explicit feedbacks). For implicit preference, we first apply a probabilistic tensor factorization model (PTF) to capture preference in a latent subspace. Then, in order to incorporate contextual signals from meta data, we extend PTF by proposing an Implicit Preference Model (IPM), which can simultaneously capture users'/restaurants'/time' preference in the collaborative filtering and dining preference in a specific context (e.g., spatial distance preference, environmental preference). For explicit preference, we propose Explicit Preference Model (EPM) by combining matrix factorization with topic modeling to discover the user preference embedded both in rating score and text content. Finally, we design a unified model termed as Collective Implicit Explicit Preference Model (CIEPM) to combine implicit and explicit preference together for restaurant recommendation. To evaluate the performance of our system, we conduct extensive experiments with large-scale datasets covering hundreds of thousands of users and restaurants. The results reveal that our system is effective for restaurant recommendation.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87797722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Relevance of Irrelevant Alternatives","authors":"Austin R. Benson, Ravi Kumar, A. Tomkins","doi":"10.1145/2872427.2883025","DOIUrl":"https://doi.org/10.1145/2872427.2883025","url":null,"abstract":"Multinomial logistic regression is a powerful tool to model choice from a finite set of alternatives, but it comes with an underlying model assumption called the independence of irrelevant alternatives, stating that any item added to the set of choices will decrease all other items' likelihood by an equal fraction. We perform statistical tests of this assumption across a variety of datasets and give results showing how often it is violated. When this axiom is violated, choice theorists will often invoke a richer model known as nested logistic regression, in which information about competition among items is encoded in a tree structure known as a nest. However, to our knowledge there are no known algorithms to induce the correct nest structure. We present the first such algorithm, which runs in quadratic time under an oracle model, and we pair it with a matching lower bound. We then perform experiments on synthetic and real datasets to validate the algorithm, and show that nested logit over learned nests outperforms traditional multinomial regression. Finally, in addition to automatically learning nests, we show how nests may be constructed by hand to test hypotheses about the data, and evaluated by their explanatory power.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82243638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Empirical Analysis of Algorithmic Pricing on Amazon Marketplace","authors":"Le Chen, A. Mislove, Christo Wilson","doi":"10.1145/2872427.2883089","DOIUrl":"https://doi.org/10.1145/2872427.2883089","url":null,"abstract":"The rise of e-commerce has unlocked practical applications for algorithmic pricing (also called dynamic pricing algorithms), where sellers set prices using computer algorithms. Travel websites and large, well known e-retailers have already adopted algorithmic pricing strategies, but the tools and techniques are now available to small-scale sellers as well. While algorithmic pricing can make merchants more competitive, it also creates new challenges. Examples have emerged of cases where competing pieces of algorithmic pricing software interacted in unexpected ways and produced unpredictable prices, as well as cases where algorithms were intentionally designed to implement price fixing. Unfortunately, the public currently lack comprehensive knowledge about the prevalence and behavior of algorithmic pricing algorithms in-the-wild. In this study, we develop a methodology for detecting algorithmic pricing, and use it empirically to analyze their prevalence and behavior on Amazon Marketplace. We gather four months of data covering all merchants selling any of 1,641 best-seller products. Using this dataset, we are able to uncover the algorithmic pricing strategies adopted by over 500 sellers. We explore the characteristics of these sellers and characterize the impact of these strategies on the dynamics of the marketplace.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81711552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beyond the Baseline: Establishing the Value in Mobile Phone Based Poverty Estimates","authors":"Chris Smith-Clarke, L. Capra","doi":"10.1145/2872427.2883076","DOIUrl":"https://doi.org/10.1145/2872427.2883076","url":null,"abstract":"Within the remit of `Data for Development' there have been a number of promising recent works that investigate the use of mobile phone Call Detail Records (CDRs) to estimate the spatial distribution of poverty or socio-economic status. The methods being developed have the potential to offer immense value to organisations and agencies who currently struggle to identify the poorest parts of a country, due to the lack of reliable and up to date survey data in certain parts of the world. However, the results of this research have thus far only been presented in isolation rather than in comparison to any alternative approach or benchmark. Consequently, the true practical value of these methods remains unknown. Here, we seek to allay this shortcoming, by proposing two baseline poverty estimators grounded on concrete usage scenarios: one that exploits correlation with population density only, to be used when no poverty data exists at all; and one that also exploits spatial autocorrelation, to be used when poverty data has been collected for a few regions within a country. We then compare the predictive performance of these baseline models with models that also include features derived from CDRs, so to establish their real added value. We present extensive analysis of the performance of all these models on data acquired for two developing countries -- Senegal and Ivory Coast. Our results reveal that CDR-based models do provide more accurate estimates in most cases; however, the improvement is modest and more significant when estimating (extreme) poverty intensity rates rather than mean wealth.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90582530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised, Efficient and Semantic Expertise Retrieval","authors":"Christophe Van Gysel, M. de Rijke, M. Worring","doi":"10.1145/2872427.2882974","DOIUrl":"https://doi.org/10.1145/2872427.2882974","url":null,"abstract":"We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79323060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Just in Time: Controlling Temporal Performance in Crowdsourcing Competitions","authors":"Markus Rokicki, Sergej Zerr, Stefan Siersdorfer","doi":"10.1145/2872427.2883075","DOIUrl":"https://doi.org/10.1145/2872427.2883075","url":null,"abstract":"Many modern data analytics applications in areas such as crisis management, stock trading, and healthcare, rely on components capable of nearly real-time processing of streaming data produced at varying rates. In addition to automatic processing methods, many tasks involved in those applications require further human assessment and analysis. However, current crowdsourcing platforms and systems do not support stream processing with variable loads. In this paper, we investigate how incentive mechanisms in competition based crowdsourcing can be employed in such scenarios. More specifically, we explore techniques for stimulating workers to dynamically adapt to both anticipated and sudden changes in data volume and processing demand, and we analyze effects such as data processing throughput, peak-to-average ratios, and saturation effects. To this end, we study a wide range of incentive schemes and utility functions inspired by real world applications. Our large-scale experimental evaluation with more than 900 participants and more than 6200 hours of work spent by crowd workers demonstrates that our competition based mechanisms are capable of adjusting the throughput of online workers and lead to substantial on-demand performance boosts.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79697598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}