Gayle McElvain, George Sanchez, Don Teo, Tonya Custis
{"title":"Non-factoid Question Answering in the Legal Domain","authors":"Gayle McElvain, George Sanchez, Don Teo, Tonya Custis","doi":"10.1145/3331184.3331431","DOIUrl":"https://doi.org/10.1145/3331184.3331431","url":null,"abstract":"Non-factoid question answering in the legal domain must provide legally correct, jurisdictionally relevant, and conversationally responsive answers to user-entered questions. We present work done on a QA system that is entirely based on IR and NLP, and does not rely on a structured knowledge base. Our system retrieves concise one-sentence answers for basic questions about the law. It is not restricted in scope to particular topics or jurisdictions. The corpus of potential answers contains approximately 22M documents classified to over 120K legal topics.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88441278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TDP: Personalized Taxi Demand Prediction Based on Heterogeneous Graph Embedding","authors":"Zhenlong Zhu, Ruixuan Li, Minghui Shan, Yuhua Li, Lu Gao, Fei Wang, Jixing Xu, X. Gu","doi":"10.1145/3331184.3331368","DOIUrl":"https://doi.org/10.1145/3331184.3331368","url":null,"abstract":"Predicting users' irregular trips in a short term period is one of the crucial tasks in the intelligent transportation system. With the prediction, the taxi requesting services, such as Didi Chuxing in China, can manage the transportation resources to offer better services. There are several different transportation scenes, such as commuting scene and entertainment scene. The origin and the destination of entertainment scene are more unsure than that of commuting scene, so both origin and destination should be predicted. Moreover, users' trips on Didi platform is only a part of their real life, so these transportation data are only few weak samples. To address these challenges, in this paper, we propose Taxi Demand Prediction (TDP) model in challenging entertainment scene based on heterogeneous graph embedding and deep neural predicting network. TDP aims to predict next possible trip edges that have not appeared in historical data for each user in entertainment scene. Experimental results on the real-world dataset show that TDP achieves significant improvements over the state-of-the-art methods.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88080931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Collaborative Neural Network for Robust Recommendation","authors":"Feng Yuan, Lina Yao, B. Benatallah","doi":"10.1145/3331184.3331321","DOIUrl":"https://doi.org/10.1145/3331184.3331321","url":null,"abstract":"Most of recent neural network(NN)-based recommendation techniques mainly focus on improving the overall performance, such as hit ratio for top-N recommendation, where the users' feedbacks are considered as the ground-truth. In real-world applications, those feedbacks are possibly contaminated by imperfect user behaviours, posing challenges on the design of robust recommendation methods. Some methods apply man-made noises on the input data to train the networks more effectively (e.g. the collaborative denoising auto-encoder). In this work, we propose a general adversarial training framework for NN-based recommendation models, improving both the model robustness and the overall performance. We apply our approach on the collaborative auto-encoder model, and show that the combination of adversarial training and NN-based models outperforms highly competitive state-of-the-art recommendation methods on three public datasets.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90374792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Curation of Content Tables for Educational Videos","authors":"Arpan Mukherjee, Shubhi Tiwari, Tanya Chowdhury, Tanmoy Chakraborty","doi":"10.1145/3331184.3331400","DOIUrl":"https://doi.org/10.1145/3331184.3331400","url":null,"abstract":"Traditional forms of education are increasingly being replaced by online forms of learning. With many degrees being awarded without the requirement of co-location, it becomes necessary to build tools to enhance online learning interfaces. Online educational videos are often long and do not have enough metadata. Viewers trying to learn about a particular topic have to go through the entire video to find suitable content. We present a novel architecture to curate content tables for educational videos. We harvest text and acoustic properties of the videos to form a hierarchical content table (similar to a table of contents available in a textbook). We allow users to browse the video smartly by skipping to a particular portion rather than going through the entire video. We consider other text-based approaches as our baselines. We find that our approach beats the macro F1-score and micro F1-score of baseline by 39.45% and 35.76% respectively. We present our demo as an independent web page where the user can paste the URL of the video to obtain a generated hierarchical table of contents and navigate to the required content. In the spirit of reproducibility, we make our code public at https://goo.gl/Qzku9d and provide a screen cast to be viewed at https://goo.gl/4HSV1v.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84783974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information Cascades Modeling via Deep Multi-Task Learning","authors":"Xueqin Chen, Kunpeng Zhang, Fan Zhou, Goce Trajcevski, Ting Zhong, Fengli Zhang","doi":"10.1145/3331184.3331288","DOIUrl":"https://doi.org/10.1145/3331184.3331288","url":null,"abstract":"Effectively modeling and predicting the information cascades is at the core of understanding the information diffusion, which is essential for many related downstream applications, such as fake news detection and viral marketing identification. Conventional methods for cascade prediction heavily depend on the hypothesis of diffusion models and hand-crafted features. Owing to the significant recent successes of deep learning in multiple domains, attempts have been made to predict cascades by developing neural networks based approaches. However, the existing models are not capable of capturing both the underlying structure of a cascade graph and the node sequence in the diffusion process which, in turn, results in unsatisfactory prediction performance. In this paper, we propose a deep multi-task learning framework with a novel design of shared-representation layer to aid in explicitly understanding and predicting the cascades. As it turns out, the learned latent representation from the shared-representation layer can encode the structure and the node sequence of the cascade very well. Our experiments conducted on real-world datasets demonstrate that our method can significantly improve the prediction accuracy and reduce the computational cost compared to state-of-the-art baselines.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82085716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"USEing Transfer Learning in Retrieval of Statistical Data","authors":"A. Firsov, Vladimir Bugay, A. Karpenko","doi":"10.1145/3331184.3331427","DOIUrl":"https://doi.org/10.1145/3331184.3331427","url":null,"abstract":"DSSM-like models showed good results in retrieval of short documents that semantically match the query. However, these models require large collections of click-through data that are not available in some domains. On the other hand, the recent advances in NLP demonstrated the possibility to fine-tune language models and models trained on one set of tasks to achieve a state of the art results on a multitude of other tasks or to get competitive results using much smaller training sets. Following this trend, we combined DSSM-like architecture with USE (Universal Sentence Encoder) and BERT (Bidirectional Encoder Representations from Transformers) models in order to be able to fine-tune them on a small amount of click-through data and use them for information retrieval. This approach allowed us to significantly improve our search engine for statistical data.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"72 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89251980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lauri Kangassalo, Michiel M. A. Spapé, Giulio Jacucci, Tuukka Ruotsalo
{"title":"Why do Users Issue Good Queries?: Neural Correlates of Term Specificity","authors":"Lauri Kangassalo, Michiel M. A. Spapé, Giulio Jacucci, Tuukka Ruotsalo","doi":"10.1145/3331184.3331243","DOIUrl":"https://doi.org/10.1145/3331184.3331243","url":null,"abstract":"Despite advances in the past few decades in studying what kind of queries users input to search engines and how to suggest queries for the users, the fundamental question of what makes human cognition able to estimate goodness of query terms is largely unanswered. For example, a person searching information about \"cats'' is able to choose query terms, such as \"housecat'', \"feline'', or \"animal'' and avoid terms like \"similar'', \"variety'', and \"distinguish''. We investigated the association between the specificity of terms occurring in documents and human brain activity measured via electroencephalography (EEG). We analyzed the brain activity data of fifteen participants, recorded in response to reading terms from Wikipedia documents. Term specificity was shown to be associated with the amplitude of evoked brain responses. The results indicate that by being able to determine which terms carry maximal information about, and can best discriminate between, documents, people have the capability to enter good query terms. Moreover, our results suggest that the effective query term selection process, often observed in practical search behavior studies, has a neural basis. We believe our findings constitute an important step in revealing the cognitive processing behind query formulation and evaluating informativeness of language in general.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89277869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple Query Processing via Logic Function Factoring","authors":"Matteo Catena, N. Tonellotto","doi":"10.1145/3331184.3331297","DOIUrl":"https://doi.org/10.1145/3331184.3331297","url":null,"abstract":"Some extensions to search systems require support for multiple query processing. This is the case with query variations, i.e., different query formulations of the same information need. The results of their processing can be fused together to improve effectiveness, but this requires to traverse more than once the query terms' posting lists, thus prolonging the multiple query processing time. In this work, we propose an approach to optimize the processing of query variations to reduce their overall response time. Similarly to the standard Boolean model, we firstly represent a group of query variations as a logic function where Boolean variables represent query terms. We then apply factoring to such function, in order to produce a more compact but logically equivalent representation. The factored form is used to process the query variations in a single pass over the inverted index. We experimentally show that our approach can improve by up to 1.95× the mean processing time of a multiple query with no statistically significant degradation in terms of NDCG@10.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88568548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expert-Guided Entity Extraction using Expressive Rules","authors":"M. Kejriwal, Runqi Shao, Pedro A. Szekely","doi":"10.1145/3331184.3331392","DOIUrl":"https://doi.org/10.1145/3331184.3331392","url":null,"abstract":"Knowledge Graph Construction (KGC) is an important problem that has many domain-specific applications, including semantic search and predictive analytics. As sophisticated KGC algorithms continue to be proposed, an important, neglected use case is to empower domain experts who do not have much technical background to construct high-fidelity, interpretable knowledge graphs. Such domain experts are a valuable source of input because of their (both formal and learned) knowledge of the domain. In this demonstration paper, we present a system that allows domain experts to construct knowledge graphs by writing sophisticated rule-based entity extractors with minimal training, using a GUI-based editor that offers a range of complex facilities.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"357 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76510400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ENT Rank: Retrieving Entities for Topical Information Needs through Entity-Neighbor-Text Relations","authors":"Laura Dietz","doi":"10.1145/3331184.3331257","DOIUrl":"https://doi.org/10.1145/3331184.3331257","url":null,"abstract":"Related work has demonstrated the helpfulness of utilizing information about entities in text retrieval; here we explore the converse: Utilizing information about text in entity retrieval. We model the relevance of Entity-Neighbor-Text (ENT) relations to derive a learning-to-rank-entities model. We focus on the task of retrieving (multiple) relevant entities in response to a topical information need such as \"Zika fever\". The ENT Rank model is designed to exploit semi-structured knowledge resources such as Wikipedia for entity retrieval. The ENT Rank model combines (1) established features of entity-relevance, with (2) information from neighboring entities (co-mentioned or mentioned-on-page) through (3) relevance scores of textual contexts through traditional retrieval models such as BM25 and RM3.","PeriodicalId":20700,"journal":{"name":"Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89570677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}