{"title":"Tutorial on Task-Based Search and Assistance","authors":"C. Shah, Ryen W. White","doi":"10.1145/3397271.3401422","DOIUrl":"https://doi.org/10.1145/3397271.3401422","url":null,"abstract":"While great strides are made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has opened up new modalities for interacting with information, but these agents will need to be able to work more intelligently in understanding the context and helping the users at task level. This tutorial will introduce the attendees to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). Specifically, it will cover several recent theories, models, and methods that show how to represent tasks and use behavioral data to extract task information. It will then show how this knowledge or model could contribute to addressing emerging retrieval and recommendation problems.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126050453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metadata Matters in User Engagement Prediction","authors":"Xiang Chen, Saayan Mitra, Viswanathan Swaminathan","doi":"10.1145/3397271.3401201","DOIUrl":"https://doi.org/10.1145/3397271.3401201","url":null,"abstract":"Predicting user engagement (e.g., click-through rate, conversion rate) on the display ads plays a critical role in delivering the right ad to the right user in online advertising. Existing techniques spanning Logistic Regression to Factorization Machines and their derivatives, focus on modeling the interactions among handcrafted features to predict the user engagement. Little attention has been paid on how the ad fits with the context (e.g., hosted webpage, user demographics). In this paper, we propose to include the metadata feature, which captures the visual appearance of the ad, in the user engagement prediction task. In particular, given a data sample, we combine both the basic context features, which have been widely used in existing prediction models, and the metadata feature, which is extracted from the ad using a state-of-the-art deep learning framework, to predict user engagement. To demonstrate the effectiveness of the proposed metadata feature, we compare the performance of the widely used prediction models before and after integrating the metadata feature. Our experimental results on a real-world dataset demonstrate that the metadata feature is able to further improve the prediction performance.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129360093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Residual-Duet Network with Tree Dependency Representation for Chinese Question-Answering Sentiment Analysis","authors":"Guangyi Hu, Chongyang Shi, Shufeng Hao, Yunru Bai","doi":"10.1145/3397271.3401226","DOIUrl":"https://doi.org/10.1145/3397271.3401226","url":null,"abstract":"Question-answering sentiment analysis (QASA) is a novel but meaningful sentiment analysis task based on question-answering online reviews. Existing neural network-based models that conduct sentiment analysis of online reviews have already achieved great success. However, the syntax and implicitly semantic connection in the dependency tree have not been made full use of, especially for Chinese which has specific syntax. In this work, we propose a Residual-Duet Network leveraging textual and tree dependency information for Chinese question-answering sentiment analysis. In particular, we explore the synergies of graph embedding with structural dependency links to learn syntactic information. The transverse and longitudinal compression encoders are developed to capture sentiment evidence with disparate types of compression and different residual connections. We evaluate our model on three Chinese QASA datasets in different domains. Experimental results demonstrate the superiority of our proposed model in Chinese question-answering sentiment analysis.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129214909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"APS: An Active PubMed Search System for Technology Assisted Reviews","authors":"Dan Li, Panagiotis Zafeiriadis, E. Kanoulas","doi":"10.1145/3397271.3401401","DOIUrl":"https://doi.org/10.1145/3397271.3401401","url":null,"abstract":"Systematic reviews constitute the cornerstone of Evidence-based Medicine. They can provide guidance to medical policy-making by synthesizing all available studies regarding a certain topic. However, conducting systematic reviews has become a laborious and time-consuming task due to the large amount and rapid growth of published literature. The TAR approaches aim to accelerate the screening stage of systematic reviews by combining machine learning algorithms and human relevance feedback. In this work, we built an online active search system for systematic reviews, named APS, by applying an state-of-the-art TAR approach -- Continuous Active Learning. The system is built on the top of the PubMed collection, which is a widely used database of biomedical literature. It allows users to conduct the abstract screening for systematic reviews. We demonstrate the effectiveness and robustness of the APS in detecting relevant literature and reducing workload for systematic reviews using the CLEF TAR 2017 benchmark.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131305484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study of Methods for the Generation of Domain-Aware Word Embeddings","authors":"Dominic Seyler, Chengxiang Zhai","doi":"10.1145/3397271.3401287","DOIUrl":"https://doi.org/10.1145/3397271.3401287","url":null,"abstract":"Word embeddings are essential components for many text data applications. In most work, \"out-of-the-box\" embeddings trained on general text corpora are used, but they can be less effective when applied to domain-specific settings. Thus, how to create \"domain-aware\" word embeddings is an interesting open research question. In this paper, we study three methods for creating domain-aware word embeddings based on both general and domain-specific text corpora, including concatenation of embedding vectors, weighted fusion of text data, and interpolation of aligned embedding vectors. Even though the investigated strategies are tailored for domain-specific tasks, they are general enough to be applied to any domain and are not specific to a single task. Experimental results show that all three methods can work well, however, the interpolation method consistently works best.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128862062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Liu, Shengsheng Qian, Yang Guan, Jiawei Zhan, Long Ying
{"title":"Joint-modal Distribution-based Similarity Hashing for Large-scale Unsupervised Deep Cross-modal Retrieval","authors":"Song Liu, Shengsheng Qian, Yang Guan, Jiawei Zhan, Long Ying","doi":"10.1145/3397271.3401086","DOIUrl":"https://doi.org/10.1145/3397271.3401086","url":null,"abstract":"Hashing-based cross-modal search which aims to map multiple modality features into binary codes has attracted increasingly attention due to its storage and search efficiency especially in large-scale database retrieval. Recent unsupervised deep cross-modal hashing methods have shown promising results. However, existing approaches typically suffer from two limitations: (1) They usually learn cross-modal similarity information separately or in a redundant fusion manner, which may fail to capture semantic correlations among instances from different modalities sufficiently and effectively. (2) They seldom consider the sampling and weighting schemes for unsupervised cross-modal hashing, resulting in the lack of satisfactory discriminative ability in hash codes. To overcome these limitations, we propose a novel unsupervised deep cross-modal hashing method called Joint-modal Distribution-based Similarity Hashing (JDSH) for large-scale cross-modal retrieval. Firstly, we propose a novel cross-modal joint-training method by constructing a joint-modal similarity matrix to fully preserve the cross-modal semantic correlations among instances. Secondly, we propose a sampling and weighting scheme termed the Distribution-based Similarity Decision and Weighting (DSDW) method for unsupervised cross-modal hashing, which is able to generate more discriminative hash codes by pushing semantic similar instance pairs closer and pulling semantic dissimilar instance pairs apart. The experimental results demonstrate the superiority of JDSH compared with several unsupervised cross-modal hashing methods on two public datasets NUS-WIDE and MIRFlickr.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121951011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nonlinear Robust Discrete Hashing for Cross-Modal Retrieval","authors":"Zhan Yang, J. Long, Lei Zhu, Wenti Huang","doi":"10.1145/3397271.3401152","DOIUrl":"https://doi.org/10.1145/3397271.3401152","url":null,"abstract":"Hashing techniques have recently been successfully applied to solve similarity search problems in the information retrieval field because of their significantly reduced storage and high-speed search capabilities. However, the hash codes learned from most recent cross-modal hashing methods lack the ability to comprehensively preserve adequate information, resulting in a less than desirable performance. To solve this limitation, we propose a novel method termed Nonlinear Robust Discrete Hashing (NRDH), for cross-modal retrieval. The main idea behind NRDH is motivated by the success of neural networks, i.e., nonlinear descriptors, in the field of representation learning, and the use of nonlinear descriptors instead of simple linear transformations is more in line with the complex relationships that exist between common latent representation and heterogeneous multimedia data in the real world. In NRDH, we first learn a common latent representation through nonlinear descriptors to encode complementary and consistent information from the features of the heterogeneous multimedia data. Moreover, an asymmetric learning scheme is proposed to correlate the learned hash codes with the common latent representation. Empirically, we demonstrate that NRDH is able to successfully generate a comprehensive common latent representation that significantly improves the quality of the learned hash codes. Then, NRDH adopts a linear learning strategy to fast learn the hash function with the learned hash codes. Extensive experiments performed on two benchmark datasets highlight the superiority of NRDH over several state-of-the-art methods.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120995368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query Rewriting for Voice Shopping Null Queries","authors":"Iftah Gamzu, Marina Haikin, N. Halabi","doi":"10.1145/3397271.3401052","DOIUrl":"https://doi.org/10.1145/3397271.3401052","url":null,"abstract":"Voice shopping using natural language introduces new challenges related to customer queries, like handling mispronounced, misexpressed, and misunderstood queries. Voice null queries, which result in no offers, have negative impact on customers shopping experience. Query rewriting (QR) attempts to automatically replace null queries with alternatives that lead to relevant results. We present a new approach for pre-retrieval QR of voice shopping null queries. Our proposed QR framework first generates alternative queries using a search index-based approach that targets different potential failures in voice queries. Then, a machine-learning component ranks these alternatives, and the original query is amended by the selected alternative. We provide an experimental evaluation of our approach based on data logs of a commercial voice assistant and an e-commerce website, demonstrating that it outperforms several baselines by more than $22%$. Our evaluation also highlights an interesting phenomenon, showing that web shopping null queries are considerably different, and apparently easier to fix, than voice queries. This further substantiates the use of specialized mechanisms for the voice domain. We believe that our proposed framework, mapping tail queries to head queries, is of independent interest since it can be extended and applied to other domains.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126588178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zheng Gao, Hongsong Li, Zhuoren Jiang, Xiaozhong Liu
{"title":"Detecting User Community in Sparse Domain via Cross-Graph Pairwise Learning","authors":"Zheng Gao, Hongsong Li, Zhuoren Jiang, Xiaozhong Liu","doi":"10.1145/3397271.3401055","DOIUrl":"https://doi.org/10.1145/3397271.3401055","url":null,"abstract":"Cyberspace hosts abundant interactions between users and different kinds of objects, and their relations are often encapsulated as bipartite graphs. Detecting user community in such heterogeneous graphs is an essential task to uncover user information needs and to further enhance recommendation performance. While several main cyber domains carrying high-quality graphs, unfortunately, most others can be quite sparse. However, as users may appear in multiple domains (graphs), their high-quality activities in the main domains can supply community detection in the sparse ones, e.g., user behaviors on Google can help thousands of applications to locate his/her local community when s/he uses Google ID to login those applications. In this paper, our model, Pairwise Cross-graph Community Detection (PCCD), is proposed to cope with the sparse graph problem by involving external graph knowledge to learn user pairwise community closeness instead of detecting direct communities. Particularly in our model, to avoid taking excessive propagated information, a two-level filtering module is utilized to select the most informative connections through both community and node level filters. Subsequently, a Community Recurrent Unit (CRU) is designed to estimate pairwise user community closeness. Extensive experiments on two real-world graph datasets validate our model against several strong alternatives. Supplementary experiments also validate its robustness on graphs with varied sparsity scales.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125354213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Social Media for Medical Text Simplification","authors":"Nikhil Pattisapu, Nishant Prabhu, Smriti Bhati, Vasudeva Varma","doi":"10.1145/3397271.3401105","DOIUrl":"https://doi.org/10.1145/3397271.3401105","url":null,"abstract":"Patients are increasingly using the web for understanding medical information, making health decisions, and validating physicians' advice. However, most of this content is tailored to an expert audience, due to which people with inadequate health literacy often find it difficult to access, comprehend, and act upon this information. Medical text simplification aims to alleviate this problem by computationally simplifying medical text. Most text simplification methods employ neural seq-to-seq models for this task. However, training such models requires a corpus of aligned complex and simple sentences. Creating such a dataset manually is effort intensive, while creating it automatically is prone to alignment errors. To overcome these challenges, we propose a denoising autoencoder based neural model for this task which leverages the simplistic writing style of medical social media text. Experiments on four datasets show that our method significantly outperforms the best known medical text simplification models across multiple automated and human evaluation metrics. Our model achieves an improvement of up to 16.52% over the existing best performing model on SARI which is the primary metric to evaluate text simplification models.","PeriodicalId":252050,"journal":{"name":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127883356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}